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PREFACE 


We —— or the Black Chamber —— have a little agreement with [Knuth]; 
he doesn’t publish the real Volume 4 of The Art of Computer Programming, 

and they don’t render him metabolically challenged. 

—— CHARLES STROSS ，The Atrocity Archive (2001) 


This booklet contains draft material that I’m circulating to experts in the 
field, in hopes that they can help remove its most egregious errors before too 
many other people see it. I am also, however, posting it on the Internet for 
courageous and/or random readers who don’t mind the risk of reading a few 
pages that have not yet reached a very mature state. Beware: This material 
has not yet been proofread as thoroughly as the manuscripts of Volumes 1,2, 
3, and 4A were at the time of their first printings. And those carefully-checked 
volumes ， alas，were subsequently found to contain thousands of mistakes. 

Given this caveat, I hope that my errors this time will not be so numerous 
and/or obtrusive that you will be discouraged from reading the material carefully. 
I did try to make the text both interesting and authoritative, as far as it goes. 
But the field is vast; I cannot hope to have surrounded it enough to corral it 
completely. So I beg you to let me know about any deficiencies that you discover. 

To put the material in context, this pre-fascicle contains an exposition of 
mathematical material (mostly about probability theory) that I plan to include 
at the beginning of Volume 4B. Its raison d^etre is explained below, in an excerpt 
from the preface to that volume. 


氺氺氺 


Probability theory has made huge strides since I “completed” my college 
education in 1963; hence Pm basically self-taught with respect to these new¬ 
fangled ideas, and I fear that in many respects my knowledge lags behind that 
of today’s students. I’ve tried my best to get the story right, yet I fear that in 
many respects I’m woefully ignorant. 

For example，I urgently need your help with respect to some exercises that I 
made up as I was preparing this material. I certainly don’t like to receive credit 
for things that have already been published by others, and most of these results 
are quite natural “fruits” that were just waiting to be “plucked.” Therefore 
please tell me if you know who deserves to be credited，with respect to the ideas 
found in exercises 6, 8, 9, 19, 32, 33, 38, 73, 88, or 96. 
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PREFACE 


氺氺氺 

Special thanks are due to Persi Diaconis，Omid Etesami，Svante Janson，Sheldon 
Ross，Ernst Schulte-Geers，and ... for their detailed comments on my early 
attempts at exposition, as well as to numerous other correspondents who have 
contributed crucial corrections. 


Diaconis 

Etesami 

Janson 

Ross 

Schulte-Geers 

Knuth 


氺氺氺 

I happily offer a “finder’s fee” of $2.56 for each error in this draft when it is first 
reported to me，whether that error be typographical, technical, or historical. 
The same reward holds for items that I forgot to put in the index. And valuable 
suggestions for improvements to the text are worth 32^ each. (Furthermore, if 
you find a better solution to an exercise, Pll actually do my best to give you 
immortal glory, by publishing your name in the eventual book:—) 

Cross references to yet-unwritten material sometimes appear as c 00 5 ; this 
impossible value is a placeholder for the actual numbers to be supplied later. 

Happy reading! 

Stanford，California D. E. K. 

21 October 2012 
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MPR 


Part of the Preface to Volume 4B 

During the years that I’ve been preparing Volume 4 ， I’ve often run across 
basic techniques of probability theory that I would have put into Section 1.2 
of Volume 1 if Vd been clairvoyant enough to anticipate them in the 1960s. 
Finally I realized that I ought to collect most of them together in one place ， 
near the beginning of Volume 4B，because the story of these developments is too 
interesting to be broken up into little pieces scattered here and there. 

Therefore this volume begins with a special section entitled “Mathematical 
Preliminaries Redux，” and future sections use the abbreviation C MPR’ to refer 
to its equations and its exercises. 
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In books of this nature I can only suggest you keep it 

as simple as the subject will allow. 

—— KODE VICIOUS (2012) 
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MATHEMATICAL PRELIMINARIES REDUX 


Many parts of this book deal with discrete probabilities, namely with a finite or 
countably infinite set ft of atomic events cj ，each of which has a given probability 
Pr(cj)，where 

0 < Pr(cj) < 1 and ^ Pr(cj) = 1. (l) 

This set O, together with the function Pr，is called a “probability space •” For 
example, 0 might be the set of all ways to shuffle a pack of 52 playing cards ， 
with Pr(cj) = 1/52! for every such arrangement. 

An event is, intuitively, a proposition that can be either true or false with 
certain probability. It might, for instance，be the statement “the top card is an 
ace，” with probability 1/13. Formally，an event A is a subset of Q, namely the 
set of all atomic events for which the corresponding proposition A is true; and 


Pr(A) = Z Pr(cj) = ^ Pr(o;) [cj G ^4]. ( 2 ) 


A random variable is a function that assigns a value to every atomic event. 
We typically use uppercase letters for random variables, and lowercase letters 
for the values that they might assume; thus, we might say that the probability 
of the event X = x is Pr(X = x) = Pr(cj) [X(uj) = x]. In our playing card 

example，the top card T is a random variable，and we have Pr(T = Q4) = 1/52. 
(Sometimes，as here，the lowercase-letter convention is ignored.) 

The random variables •…， are said to be independent if 

Pr(Xi = x\ and … and = Xk) = Pr(Xi = xi)... Pr(Xk = Xk) (3) 

for all (xi^... ^Xk)- For example, if F and S denote the face value and suit of 
the top card T，clearly F and S are independent. Hence in particular we have 
Pr(T = Q^) = Pr(F = Q) Pr(5 = ^). But T is not independent of the bottom 
card, B\ indeed，we have Pr(T = t and 5 = 6)/ 1/52 2 for any cards t and b. 

A system of n random variables is called A:-wise independent if no k of 
its variables are dependent. With pairwise (2-wise) independence，for example ， 
we could have variable X independent of variable Y independent of Z, and 
variable Z independent of X; yet all three variables needn’t be independent 
(see exercise 6). Similarly ， A-wise independence does not imply (k + 1) - wise 
independence. But (k + l)-wise independence does imply 尧 -wise independence. 

The conditional probability of an event given an event is 


Pv(A I B )= 


Pv(A 0 B) 
Pr ⑻ 


Pv(A and B) 

""Pr ⑻ ~~ 


(4) 


discrete probabilities 
atomic events 
probability space 
shuffle 

playing cards 
event 

random variable 
independent 
random variables 
众 -wise independent 

pairwise independent random variables 
conditional probability 
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MATHEMATICAL PRELIMINARIES REDUX 


when Pt(B) > 0, otherwise it’s Pr(A). Imagine breaking the whole probability 
space fl into two parts, fl' = B and fl" 二 B 二 fl \ B, with Pr(Q / ) = Pr(B) and 
Pr(n /, ) = 1 — Pr(B). If we assign new probabilities to atomic events by the rules 


Pr 7 ^) = 


Pr(cc ； )[cc ； G 
"""Pr(n ; ) 


Pr 77 ^) = Pr(w|n ,/ ) 


PY(co)[coen"[ 

"""Pr(n")""" 


we obtain new probability spaces and ft", allowing us to contemplate a world 
where B is always true and another world where B is always false. It’s like taking 
two branches in a tree，each of which has its own logic. Conditional probability is 
important for the analysis of algorithms because algorithms often get into differ¬ 
ent states where different probabilities are relevant. Notice that we always have 


Pt(A) = Pt(A I B) - Pr(5) + Pt(A\B) - Pt(B). ( 5 ) 

The events ^ are said to be independent if the random variables 

[Ai]^ •… ， [Ak] are independent. (Bracket notation applies in the usual way to 
events-as-statements, not just to events-as-subsets: [A] = 1 if is true，otherwise 
[A] = 0.) Exercise 20 proves that this happens if and only if 



for all J C {1,..., k}. 


⑹ 


In particular, events A and B are independent if and only if Pr(A\B) = Pr(A). 

When the values of a random variable X are real numbers or complex 
numbers，weVe defined its expected value EX in Section 1.2.10: We said that 


EX = ^2x(uj)Pt(co) = = x), ( 7 ) 

cj G ^ ^ 

provided that this definition makes sense when the sums are taken over infinitely 
many nonzero values. (The sum should be absolutely convergent.) A simple but 
extremely important case arises when A is any event，and when X = [A] is a 
binary random variable representing the truth of that event; then 

E[A] = (和 E [cj G ^4] Pr(cj) — ^ Pr(cj) = Pr(A). ( 8 ) 

We’ve also noted that the expectation of a sum, E(Xi + … + 為 ）， always 
equals the sum of the expectations, (EXi) + • • • + (E 為 ）， whether or not the 
random variables Xj are independent. Furthermore the expectation of a product, 
EXi … 為 ， is the product of the expectations, (EXi) … （ EX&)，if those vari¬ 
ables do happen to be independent. In Section 3.3.2 we defined the covariance ， 


covar(X,y) = E((X -EX)(Y -EY)) = (EXY) - (EX)(EY), ( 9 ) 


which tends to measure the way X and Y depend on each other. The variance, 
var(X)，is covar(X, X); the middle formula in ( 9 ) shows why it is nonnegative 
whenever the random variable X takes on only real values. 

All of these notions of expected value carry over to conditional expectation. 


m\A) = z 则 

oj^ ：A 


Pr ⑼ 
Pv(A) 


E 


x 


x 


Pr(X = x and A) 

Pv(A) 
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MATHEMATICAL PRELIMINARIES REDUX 3 


conditioned on any event when we want to work in the probability space for 
which A is true. One of the most important formulas，analogous to ( 5 )，is 

EX - ^E(XlY = y) Pr(Y = y) 

y 

= ^ ^ x Pr(X ~x\Y — y) Pr(y — y). ( 11 ) 

y x 


Furthermore there’s also another important kind of conditional expectation: 
When X and Y are random variables ， it’s often helpful to write C E(X \ Yy for 
“the expectation of X given 17’ Using that notation, Eq. ( 11 ) becomes simply 

EX = E(E(X|y)). (12) 

This is a truly marvelous identity，great for hand-waving and for impressing 
outsiders — except that it can be confusing until you understand what it means. 

In the first place，if y is a Boolean variable ， C E(X \ Yy might look as if it 
means C E(X | Y = 1)\ thus asserting that Y is true，just as C E(X \ Ay asserts the 
truth of A in ( 10 ) • No; that interpretation is wrong，quite wrong. Be warned. 

In the second place，you might think of E(X | Y) as a function of Y. Well ， 
yes; but the best way to understand \ Y) is to regard it as a random variable. 
That’s why we’re allowed to compute its expected value in ( 12 ). 

All random variables are functions of the atomic events uj. The value of 
E(X \ Y) at uj is the average of X(u/) over all events uj 1 such that Y (uj 1 ) = Y (uj)\ 

E(X\Y)(cj) - X(cy)Pr(^y)W)=l»]/PrOr = y(o;)). ( 13 ) 


Similarly ， E(X |K ， … ， Y r ) averages over events with Yj( uj , ) = Yj (cj) for l<j <r. 


For example，suppose X\ through X n are binary random variables con¬ 
strained by the condition that v{X\ ... X n ) = Xi H - \-X n = m，where m and n 

are constants with 0 < m < n; all (2) such bit vectors Xi ... X n are assumed to 

\ ! f b / 


be equally likely. Clearly EXi = m/n. But what is E(X 2 \ Xi)? If Xi = 0, the 
expectation of X 2 is m/{n — 1); otherwise that expectation is (m — l)/(n — 1); 
consequently E(X 2 |Xi) = (m —Xi)/(n — l). And what is E(X& |X[ ， … ，為 - ]J? 
The answer is easy，once you get used to the notation: If v{X\ ... X^-i) = r ， 
then Xk • •. X n is a random bit vector with v{X^ .. • X n ) = m — r; hence the 
average value of will be (m — r)/(n 1 — k) in that case. We conclude that 


E ( 為 I & ，…， 為 - d 


m — "(Xi … Xk-i) 
n + 1 — k 


iov 1 <k <n. ( 14 ) 


The random variables on both sides of these equations are the same. 


Inequalities. In practice we often want to prove that certain events are rare ， 
in the sense that they occur with very small probability. Conversely，our goal 
is sometimes to show that an event is not rare. And we’re in luck，because 
mathematicians have devised several fairly easy ways to derive upper bounds or 
lower bounds on probabilities, even when the exact values are unknown. 


binary random variables 
probability estimates— 
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4 MATHEMATICAL PRELIMINARIES REDUX 

We’ve already discussed the most important technique of this kind in Sec¬ 
tion 1.2.10. Stated in highly general terms，the basic idea can be formulated as 
follows: Let f be any nonnegative function such that f(x) > s > 0 when x ^ S. 
Then 

Pr(X G 5 ) < Ef(X)/s, (15) 

provided that Pr(X G S) and E f(X) both exist. For example ， f(x) = \x\ yields 

Pr(|X| >m) < E|X|/m ( 16 ) 

whenever m > 0. The proof is amazingly simple，because we obviously have 

E f(X) > Pr(X G 5) • 5 + Pr(X ^5)-0. ( 17 ) 

Formula ( 15 ) is often called Markov’s inequality, because A. A. Markov discussed 
the special case f(x) = \x\ a in IzviestTm Imp. Akad. Nauk ( 6 ) 1 (1907), 707-716. 
If we set f(x) = (x — EX) 2 , we get the famous 19th-century inequality of 
Bienayme and Chebyshev: 

Pr(|X-EX| >r) < var(X)/r 2 . ( 18 ) 

The case f(x) = e ax is also extremely useful. 

Another fundamental estimate, known as Jensen’s inequality [Acta Mathe¬ 
matics 30 (1906) ， 175—193]，applies to convex functions /; we’ve seen it so far 
only as a “hint” to exercise 6.2.2-36(1). The real-valued function / is said to be 
convex in an interval I of the real line, and —/ is said to be concave in J，if 

f(px-\-qy) < pf{x) + qf(y) for all x,y G /, ( 19 ) 

whenever p > 0, g > 0, and p+q = 1. This condition turns out to be equivalent to 
saying that f n (x) > 0 for all x 6 I ， if f has a second derivative /"• For example ， 
the functions e ax and x 2n are convex for all constants a and all nonnegative 
integers n; and if we restrict consideration to positive values of then f(x) = x n 
is convex for all integers n (notably f(x) = 1/x when n = —1). The functions 
\n(l/x) and xlnx are also convex for x > 0. Jensen’s inequality states that 

/(EX) < E(/(X)) ( 20 ) 

when / is convex in the interval I and the random variable X takes values only 
in I. (See exercise 42 for a proof.) For example，we have 1/ EX < E(l/X) and 
In EX > E lnX and (EX) In EX < E(X lnX), when X is positive. Notice that 
( 20 ) actually reduces to the very definition of convexity, ( 19 )，in the special case 
when X — x with probability p and X = y with probability q. 

Third and fourth on our list of remarkably useful inequalities are two classical 
results that apply to any random variable X whose values are nonnegative 
integers: 

Pr(X > 0) < EX; (“the first moment principle ”） ( 21 ) 

Pr(X > 0) > (EX) 2 /(EX 2 ). (“the second moment principle ”） ( 22 ) 

Formula ( 21 ) is obvious，because the left side is pi +P 2 +P 3 H - when is the 

probability that X = k, while the right side is pi + 2p 2 + 3 仍 + … • 


tail inequalities 

Markov’s inequality 

Bienayme 

Chebyshev 

Jensen’s inequality 

convex 

concave 

first moment principle 
second moment principle 
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5 


Formula ( 22 ) isn’t quite so obvious; it is pi + 仍 + 仍 + ••• on the left and 

(pi + 2 p2 + 3ps + … ) 2 /(pi + 4p 2 + 9^3 H - ) on the right. However, as we saw 

with Markov’s inequality，there is a remarkably simple proof，once we happen to 
discover it: 

EX 2 = E(X 2 |X > 0) Pr(X > 0) + E(X 2 |X = 0) Pr(X = 0) 

= E(X 2 \X > 0) Pr(X > 0) 

> (E(X|X > 0)) 2 Pr(X > 0) = (EX) 2 /Pr(X > 0). ( 23 ) 

In fact this proof shows that the second moment principle is valid even when X is 
not restricted to integer values (see exercise 46). Furthermore the argument can 
be strengthened to show that ( 22 ) holds even when X can take arbitrary negative 
values，provided only that EX > 0 (see exercise 47). See also exercise 118. 

Exercise 54 applies ( 21 ) and ( 22 ) to the study of random graphs. 

Another important inequality, which applies in the special case where X = 
Xi + ••• + X m is the sum of binary random variables Xj, was introduced more 
recently by S. M. Ross [Probability^ Statistics，and Optimization (New York: 
Wiley, 1994), 185-190], who calls it the “conditional expectation inequality ”： 

(24) 


Markov’s inequality 

binary 

Ross 

conditional expectation inequality 
reliability polynomial 
monotone Boolean function 
BDD 

prime implicants 
FKG inequality 


Ross showed that the right-hand side of this inequality is always at least as big 
as the bound (EX) 2 /(EX 2 ) that we get from the second moment principle (see 
exercise 50). Furthermore ， ( 24 ) is often easier to compute，even though it may 
look more complicated at first glance. 

For example, his method applies nicely to the problem of estimating a 
reliability polynomial, f (Pi ， Pn ) ， when / is a monotone Boolean function; 
here pj represents the probability that component j of a system is “up •” We ob¬ 
served in Section 7.1.4 that reliability polynomials can be evaluated exactly, using 
BDD methods，when n is reasonably small; but approximations are necessary 
when / gets complicated. The simple example ^ 5 ) = x\X 2 Xs\/X 2 XSX 4 W 

X 4 X 5 illustrates Ross’s general method: Let (!! ， ••• ， I 5 ) be independent binary 
random variables，with E= pj] and let X = Xi+X 2 +X 3 , where Xi = YiY 2 Ys^ 
X 2 = and X 3 = Y 4 Y 5 correspond to the prime implicants of f. Then 

Pr(X > 0 ) = Pr(/(y 1 ? ...,y 5 ) = 1) = E/(y 1 ? ...,y 5 ) = /( 仍 ， ...，％)，because 

the y’s are independent. And we can evaluate the bound in ( 24 ) easily: 


Pr(X >0) > 


P1P2P3 


P2P3P4 


P4P5 


1 + P4 + P4P5 Pi + 1 + P5 P1P2P3 + P2P3 + 1 


( 2 5) 


If, for example，each pj is 0.9, this formula gives « 0.848, while (EX) 2 /(EX 2 ) ^ 
0.847; the true value, piP 2 P 3 + P2P3P4 + PaP^ - PiP^PsPa - P 2 P 3 P 4 P 5 , is 0.9558. 

Many other important inequalities relating to expected values have been 
discovered，of which the most significant for our purposes in this book is the 
FKG inequality discussed in exercise 61. It yields easy proofs that certain events 
are correlated, as illustrated in exercise 62. 
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Martingales. A sequence of dependent random variables can be difficult to 
analyze, but if those variables obey invariant constraints we can often exploit 
their structure. In particular, the “martingale” property，named after a classic 
betting strategy (see exercise 67)，proves to be amazingly useful when it applies. 
Joseph L. Doob featured martingales in his pioneering book Stochastic Processes 
(New York: Wiley ， 1953)，and developed their extensive theory. 

The sequence {Z n ) = Z 0 ， Zi ， … of real-valued random variables is 
called a martingale if it satisfies the condition 

E(Z n+ i I Zq, ..., Z n ) = Z n for all n > 0. ( 26 ) 

(We also implicitly assume，as usual，that the expectations E Z n are well defined.) 
For example，when n = 0, the random variable E(Zi | Z 0 ) must be the same as 
the random variable Z 0 (see exercise 63). 

Figure 1 illustrates George Polyaks famous “urn model” [F. Eggenberger 
and G. Polya, Zeitschrift fiir angewandte Math, und Mech. 3 (1923), 279-289], 

which is associated with a particularly interesting martingale. Imagine an urn 
that initially contains two balls, one red and one black. Repeatedly remove a 
randomly chosen ball from the urn, then replace it and contribute a new ball of 
the same color. The numbers (r^b) of red and black balls will follow a path in 
the diagram, with the respective local probabilities indicated on each branch. 

One can show without difficulty that all n+1 nodes on level n of Fig. 1 will be 
reached with the same probability, l/(n + 1). Furthermore，the probability that 
a red ball is chosen when going from any level to the next is always 1/2. Thus 
the urn scheme might seem at first glance to be rather tame and uniform. But 
in fact the process turns out to be full of surprises，because any inequity between 
red and black tends to perpetuate itself. For example, if the first ball chosen is 
black，so that we go from (1 ， 1) to (1 ， 2)，the probability is only 2 In 2 — 1 ^ .386 
that the red balls will ever overtake the black ones in the future (see exercise 88 ). 

One good way to analyze Polyaks process is to use the fact that the ratios 
r/(r + b) form a martingale. Each visit to the urn changes this ratio either to 
(r + l)/(r + 6 +l) (with probability r/(r + b)) or to r/(r + 6 + l) (with probability 
6 /(r + 6)); so the expected new ratio is (rb + r 2 +r)/((r + 6 )(r + 6 + l)) = r/(r + 6 )， 
no different from what it was before. More formally，let X 0 = 1， and for n > 0 
let X n be the random variable c [the nth ball chosen is red]’. Then there are 
為 + ••• + X n red balls and Xq + ••• + X n + 1 black balls at level n of Fig. 1; 
and the sequence {Z n ) is a martingale if we define 

Z n = (Xq + ... + X n )/(n H- 2). ( 27 ) 

In practice it’s usually most convenient to define martingales Zq^ Zi ， 
in terms of auxiliary random variables Xo ， Xi, …， as we’ve just done. The 
sequence (Z n ) is said to be a martingale with respect to the sequence {X n ) if 
Z n is a function of (X 0 , … ， X n ) that satisfies 

E(Z n+ i I Xo,... ， X n ) — Z n for all n > 0. ( 28 ) 
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Level 0 


Level 1 


Level 2 


Level 3 



Fig. 1. Polyaks urn model. The probability of taking any downward path 
from (1 ， 1) to (r ， 6) is the product of the probabilities shown on the branches* 


fair with respect to the sequence 
martingale differences, see fair sequences 
fair 

independent 
stopping rule 


Furthermore we say that a sequence (Y n ) is fair with respect to the sequence {X n ) 
if Y n is a function of (X 0 , X n ) that satisfies the simpler condition 

E(F n+ i |X 0 ,...,X n ) = 0 for all n > 0; ( 29 ) 

and we call (Y n ) fair whenever 

E(F n+ i \Y 0 ,...,Y n ) = 0 for all n > 0. ( 30 ) 

Exercise 77 proves that ( 28 ) implies ( 26 ) and that ( 29 ) implies ( 30 ); thus an 
auxiliary sequence {X n ) is sufficient but not necessary for defining martingales 
and fair sequences. 

Whenever (Z n ) is a martingale, we obtain a fair sequence (Y n ) by letting 
Yq = Zq and Y n — Z n — Z n _i for n > 0， because the identity E(F n+ i | 
^ 01 ... ， ^n) ~ — Z n I Zq , • • • 5 ^ 71 ) ~ — shows that 〈 y^〉is fair 

with respect to (Z n ). Conversely，whenever (Y n ) is fair，we obtain a martingale 
{Z n ) by letting Z n = y 0 + • • • + y n ，because the identity E(Z n+i 1• • • ， Y n )= 
E(Z n + F n+ i I Yq^. .. ,y n ) = Z n shows that {Z n ) is a martingale with respect 
to {Y n ). In other words，fairness and martingaleness are essentially equivalent. 
The Y^s represent unbiased “tweaks” that change one Z to its successor. 

It’s easy to construct fair sequences. For example，every sequence of inde¬ 
pendent random variables with mean 0 is fair. And if {Y n ) is fair with re¬ 
spect to 〈 X n 〉，so is the sequence (Y^) defined by Y l n — / n (X 0 ,... ,X n _i)F n 
when / n (X 0 , • • • ， is almost any function whatsoever! (We need only 
keep f n small enough that is well defined.) In particular，we can let 

f n (Xo, … ,X n _i) = 0 for all large n，thereby making {Z n ) eventually fixed. 

A sequence of functions N u {xq” • • ， : r n _i) is called a stopping rule if each 
value is either 0 or 1 and if N n (xo 5 ... ^x n -i) = 0 implies 7V n+ i( ： ro, • • • ^x n ) = 0. 
We can assume that 7V 0 = 1. The number of steps before stopping，with respect 
to a sequence of random variables 〈 X n 〉，is then the random variable 

N = TVi (X 0 ) + iV 2 (X 0 , X!) + iV 3 (X 0 , Xi, X 2 ) + • • •. ( 3 i) 

(Intuitively, N n (xo, … ， : r n _i) means [the values Xq = 吻 ， • • 。 X n _i = x n -\ do 
not stop the process]; hence it’s really more about “going” than “stopping.”) 
Any martingale Z n = lo + • • • + F n with respect to {X n ) can be adapted to 
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8 MATHEMATICAL PRELIMINARIES REDUX 

stop with this strategy if we change it to Z、 n — FJ + • • • + where = 
N n (Xo, … ^X n -i)Y n . Gamblers who wish to “quit when ahead” are using the 
stopping rule A/" n+ i(X 0 , …， X n ) = \Z' n < 0]，when Z l n is their current balance. 

Notice that if the stopping rule always stops after at most m steps — in 
other words，if the function N m (xo^... is identically zero — then we have 

= Z 、， because Z、 n doesn’t change after the process has stopped. Therefore 
E Z’ N = E Z f m = E Zq =E Z 0 : No stopping rule can change the expected outcome 
of a martingale when the number of steps is bounded. 

An amusing game of chance called Ace Now illustrates this optional stopping 
principle. Take a deck of cards，shuffle it and place the cards face down; then 
turn them face up one at a time as follows: Just before seeing the nth card, you 
are supposed to say either “Stop” or “Deal，” based on the cards you’ve already 
observed. (If n = 52 you must say “Stop.”) After you’ve decided to stop, you 
win $12 if the next card is an ace; otherwise you lose $1. What is the best 
strategy for playing this game? Should you hold back until you have a pretty 
good chance at the $12? What is the worst strategy? Exercise 82 has the answer. 

Tail inequalities from martingales. The essence of martingales is equality 
of expectations. Yet martingales turn out to be important in the analysis of 
algorithms because we can use them to derive inequalities, namely to show that 
certain events occur with very small probability. 

To begin our study ， let’s introduce inequality into Eq. ( 26 ): A sequence {Z n ) 
is called a submartingale if it satisfies 

E(Z n+1 lZ 0 ,...,Z n ) > Z n for all n > 0. ( 32 ) 

Similarly ， it’s called a supermartingale if ‘>， is changed to in the left-hand 
part of this definition. (Thus a martingale is both sub- and super-.) In a 
submartingale we have E Zq <E Zi < EZ 2 < • ••, by taking expectations in ( 32 ). 
A supermartingale, similarly，has ever smaller expectations as n grows. One way 
to remember the difference between submartingales and supermartingales is to 
observe that their names are the reverse of what you might expect. 

Submartingales are significant largely because of the fact that they’re quite 
common. Indeed，if (Z n ) is any martingale and if / is any convex function, then 
{f(Z n )) is a submartingale (see exercise 84). For example，the sequences (\Z n \) 
and (max(Z n ,c)) and (Z^) and {e Zn ) all are submartingales whenever {Z n ) is 
known to be a martingale. If ， furthermore ， Z n is always positive, then {Z^) and 
{1/Z n ) and (ln(l/Z n )) and {Z n In Z n )^ etc.，are submartingales. 

If we modify a submartingale by applying a stopping rule, it’s easy to see that 
we get another submartingale. Furthermore, if that stopping rule is guaranteed 
to quit within m steps, well have EZ m > = E Therefore no 

stopping rule can increase the expected outcome of a submartingale^ when the 
number of steps is bounded. 

That comparatively simple observation has many important consequences. 
For example，exercise 86 uses it to give a simple proof of the so-called “maximal 


Ace Now 

optional stopping principle 
playing cards 
Tail inequalities 

large deviations, see tail inequalities 

submartingale 

supermartingale 

convex function 

stopping rule 

maximal inequality 
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inequality ”： If (Z n ) is a nonnegative submartingale then 

Pr(max(Zo, Zi,..., Z n ) > x) < (E Z n )/x^ for all $ > 0. ( 33 ) 


Special cases of this inequality are legion. For instance，martingales {Z n ) satisfy 


Pr(max(|Z 0 |,|Zi|,...,|Z n |) > ^) < E(\Z n \)/x, for all ^ > 0; ( 34 ) 

Pr(max(ZQ, ,..., > re) < E(Z^)/x, for all re > 0. ( 35 ) 

Relation ( 35 ) is known as Kolmogorovas inequality, because A. N. Kolmogorov 
proved it when Z n = Xi -\ - h X n is the sum of independent random variables 

with EX& = 0 and var(X^) = al for 1 < k < n [Math. Annalen 99 (1928), 309- 

311]. In that case var(Z n ) = o\ H - Vo 2 n — a 2 , and the inequality can be written 

Pr(|Xi| < ta, |Xi + X 2 | < ter,..., |Xi + • • • + X n | < ta) > 1 - 1/t 2 . ( 36 ) 

Chebyshev’s inequality gives only Pr(|Xi H - hX n | < ta) > 1 — 1/t 2 , which is 

a considerably weaker result. 

Another important inequality applies in the common case where we have 
good bounds on the terms Yi^ ... ^Y n that enter into the standard representation 

Z n = Fq + ^1 H - h F n of a martingale. This one is called the Hoeffding—Azuma 

inequality, after papers by W. Hoeffding [J. Amer. Statistical Association 58 
(1963), 13-30] and K. Azuma [Tohoku Math. Journal (2) 19 (1967), 357-367]. 
It reads as follows: If {Y n ) is any fair sequence with a n <Y n < b n ，then 


Pr(Yi -\~---\~Y n >x) < e -2^ 2 /((6 1 -ai ) 2 + -4-(6 n -a n ) 2 ) > 

The same bound applies to Pr(Fi H - h F n < —x)^ since —b n < —Y n < 


(37) 


—a n ; so 


Pr(|yi + • • • + y n I > x) S 2 e _2 ? /((&1_ai)2+ •• 娘 )2) . (38) 


Exercise 90 breaks the proof of this result into small steps. In fact, the proof 
even shows that a n and b n may be functions of {Yq^ ... ^ Y n -i). 

Applications. The Hoeffding—Azuma inequality is useful in the analysis of 
many algorithms because it applies to “Doob martingales，” a very general class 
of martingales that J. L. Doob featured as Example 1 in his Stochastic Processes 
(1953)，page 92. (In fact, he had already considered them many years earlier ， 
in Trans. Amer. Math. Soc. 47 (1940) ， 486.) Doob martingales arise from any 
sequence of random variables 〈 X n 〉，independent or not，and from any other 
random variable Q: We simply define 


Zn = E(Q |X。, ... ， X n ). ( 39 ) 

Then, as Doob pointed out，the resulting sequence is a martingale (see exercise 
91). In our applications, Q is an aspect of some algorithm that we wish to study ， 
and the variables X 0 , X^， … reflect the inputs to the algorithm. For example ， 
in an algorithm that uses random bits，the are those bits. 

Consider a hashing algorithm in which t objects are placed into m random 
lists，where the nth object goes into list X n ; thus 1 < X n < m for 1 < n < ^ and 
we assume that each of the m 1 possibilities is equally likely. Let Q(x\^... ^x t ) be 


Kolmogorov’s inequality 

independent 

Chebyshev’s inequality 

Hoeffding-Azuma inequality 

Hoeffding 

Azuma 

Hoeffding-Azuma inequality 
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the number of lists that remain empty after the objects have been placed into lists 
，…，， and let Z n = E(Q | Xi ， … ， X n ) be the associated Doob martingale. 
Then Zq = E(Q) is the average number of empty lists; and Z t = Q(Xi, • • • ， X t ) 
is the actual number，in any particular run of the algorithm. 

What fair sequence corresponds to this martingale? If 1 < n < the random 
variable Y n — Z n — Z n _i is f n (Xi ， • • • ， X n ), where f n (xi ， • • • ， x n ) is the average 
of 


m 


△($1 1 i ^t) ~ ^ : = $) (^5(^1 ， • • • ， ^n—11 i ^n+1 ， • • • ， A) 


x 


Q (^1 ， • • • ， 1 ， $ ， $n+l ， • • • ， ^t)) (4。) 


taken over all m t_n values of (:r n +i，• • • ， xt). 

In our application the function Q(xi” • • ^Xt) has the property that 

Q (^1 ， • • • ， 1 ， $ ， ^n +1 ， • • • ， A) _ Q (^1 ， • • • ， $n —1 ，$， ^n +1 ， • • • ， | $ 1 ( 4 工 ) 


for all x and because a change to any one hash address always changes the 
number of empty lists by either 1 ， 0, or —1. Consequently，for any fixed setting 
of the variables (x\ ， … ， $n-i ， 〜 +i ， … ， xt), we have 

max ... ^Xt) < min ... ^xt) + 1 . ( 42 ) 


The Hoeffding—Azuma inequality ( 37 ) therefore allows us to conclude that 

Pr (々 -Z 0 >x) = Pv(Y 1 < e~ 2x ^. ( 43 ) 

Furthermore, Zq in this example is m(m — l) t /m t ^ because exactly (m — 1” of 
the m 1 possible hash sequences leave any particular list empty. And the random 
variable Z t is the actual number of empty lists when the algorithm is run. Hence 
we can，for example，set x = in ( 43 )，thereby proving that 


， i • 


Pr(Z t > (m — lY/m 1-1 + y t Inf(t)) < 1// ⑷ 2 ， whenever f(t) 

The same upper bound applies to Pr(Z t < (m — — ). 




Notice that the inequality ( 41 ) was crucial in this analysis. Therefore 
the strategy weVe used to prove ( 43 ) is often called the “method of bounded 
differences •” In general，a function Q(xi, … ， a) is said to satisfy a Lipschitz 
condition in coordinate n if we have 


Q (^ 1 ， • • • ， 1 ， $ ， ^n+l，•••，&) _ Q (^ 1 ， • • • ， 1 ， $ ， $n+l ， • • • ， A) | (45) 

for all x and x f . (This terminology mimics a well-known but only slightly 
similar constraint that was introduced long ago into functional analysis by Rudolf 
Lipschitz [Crelle 63 (1864), 296-308].) Whenever condition ( 45 ) holds, for a 
function Q associated with a Doob martingale for independent random variables 
Xi^ … ， Xf ， we can prove that Pr(Fi +••• + % > x) < exp(—2x 2 /(cl + ••• + c|)). 

Let’s work out one more example，due to Colin McDiarmid [London Math. 
Soc. Lecture Notes 141 (1989 )， 148-188, § 8 (a)]: Again we consider independent 
integer-valued random variables Xi^ ^ X t with 1 < X n < m for 1 < n < 


fair sequence 

method of bounded differences 
Lipschitz condition 
McDiarmid 
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but this time we allow each X n to have a different probability distribution. 
Furthermore we define Q(x\^... ^Xt) to be the minimum number of bins into 
which objects of sizes ^ xt can be packed, where each bin has capacity m. 

This bin-packing problem sounds a lot harder than the hashing problem that 
we just solved. Indeed，the task of evaluating Q(xi，... ^x t ) is well known to be 

NP-complete [see M. R. Garey and D. S. Johnson, SICOMP 4 (1975), 397-411]. 
Yet Q obviously satisfies the condition ( 45 ) with c n = 1 for 1 < n < Therefore 
the method of bounded differences tells us that inequality ( 43 ) is true, in spite 
of the apparent difficulty of this problem! 

The only difference between this bin-packing problem and the hashing prob¬ 
lem is that we’re clueless about the value of Zq. Nobody knows how to compute 
EQ(X[ ， … ， X t ), except for very special distributions of the random variables. 
However — and this is the magic of martingales — we do know that, whatever the 
value is，the actual numbers Z t will be tightly concentrated around that average. 

If all the X’s have the same distribution, the values 爲 =EQ(Xi ， … ， X t ) 
satisfy 3 t+t，S Pt ， ， because we could always pack the t and t 1 items separately. 
Therefore, by the subadditive law (see the answer to exercise 2.5—39) ， /3 t /t 
approaches a limit /3 as t 4 00 • Still ， however，random trials won’t give us decent 
bounds on that limit, because we have no good way to compute the Q function. 

If only he could have enjoyed Martingale for its beauty and its peace 
without being chained to it by this band of responsibility and guilt! 

— P. D. JAMES, Cover Her Face (1962) 

Statements that are almost sure，or quite sure. Probabilities that depend 
on an integer n often have the property that they approach 0 or 1 as n -> 00 , 
and special terminology simplifies the discussion of such phenomena. If, say, A n 
is an event for which lim n _^ oc Pr(A n ) = 1, it’s convenient to express this fact 
in words by saying ， u A n occurs almost surely，when n is large •” （ Indeed, we 
usually don’t bother to state that n is large, if we already understand that n is 
approaching infinity in the context of the current discussion.) 

For example，if we toss a fair coin n times, we’ll find that the coin almost 
surely comes up heads more than A9n times, but fewer than .51n times. 

Furthermore ， we’ll occasionally want to express this concept tersely in for¬ 
mulas, by writing just c a.s . 5 instead of spelling out the words u almost surely.” 
For instance, the statement just made about n coin tosses can be formulated as 

•49n < Xi + • • • + X n < .51n a.s .， ( 46 ) 

if Xi, •… ， X n are independent binary random variables, each with EXj = 1 / 2 . 
In general a statement such as u A n a.s.” means that lim n _ >oc Pr(A n ) = 1; or ， 
equivalently, that lim n _ >OG Pr(A n ) = 0. 

If A n and B n are both a.s。then the combined event C n = A n C\ B n is 
also a.s.，regardless of whether those events are independent. The reason is that 
Pr(C n ) = Pr(A n U B n ) < Pv(A n ) + Pr(B n )^ which approaches 0 as n -> oo. 

Thus，to prove ( 46 ) we need only show that & + ••• + X n > A9n a.s. and 
that Xi H - h X n < .51n a.s. ， or in other words that Pr(Xi H - hX n < A9n) 
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and Pr(Xi H - hX n > .51n) both approach 0. Those probabilities are actually 

equal，by symmetry between heads and tails; so we need only show that p n = 

Pr(Xi H - h X n < A9n) approaches 0. And that’s no sweat, because we know 

from exercise 1 . 2 . 10-21 that p n < e _ 0001n . 

In fact, we’ve proved more: We’ve shown that p n is superpolynomially small, 
namely that 

p n = 0{n~ K ) for all fixed numbers K. ( 47 ) 

When the probability of an event A n is superpolynomially small, we say that A n 
holds “quite surely,” and abbreviate that by ‘q.s.’. In other words, we’ve proved 

.49tt / <C X\ + • • • + X n <c .51 tt / q.s. ( 48 ) 

We’ve seen that the combination of any two a.s. events is a.s.; hence the com¬ 
bination of any finite number of a.s. events is also a.s. That’s nice，but q.s. events 
are even nicer: The combination of any polynomial number of q.s. events is 
also q.s. For example，if n 4 different people each toss n coins, it is quite sure that 
every one of them, without exception, will obtain between A9n and .51n heads! 

(When making such asymptotic statements we ignore the inconvenient truth 
that our bound on the failure of the assertion ， 2 n 4 e _ 0001n in this case, becomes 
negligible only when n is greater than 700,000 or so.) 

EXERCISES 

1. [M21] (Nontransitive dice.) Suppose three biased dice with the respective faces 



are rolled independently at random. 

a) Show that Pr(^4 > J5) = Pr(B>C) = Pr(C>A) = 5/9. 

b) Find dice with Pr(A>B )， Pr(B>C)^ Pr(C>A) all greater than 5/9. 

c) If Fibonacci dice have F m faces instead of just six, show that we could have 

Pr(A>B) = Pr(B>C0 = F m -i/F m and Pr(C>A) = Fm^/F m ± 1/F^. 

2. [M32] Prove that the previous exercise is asymptotically optimum ，in the sense 
that min(Pr(A > B), Pr(,B > C)^ Pr(C > A)) <1/0, regardless of the number of faces. 

3. [22] (Lake Wobegon dice.) Continuing the previous exercises, find three dice such 

that Pr(A>|(A-f-B + C)) > Pr(B> ^(A + B + C)) > Pr((7> +5 + (7)) > 16/27. 

Each face of each die should be FI or IT 7 ! or p] or |TT| or 同 or ITTL 

4. [22] {Nontransitive Bingo.) Each player in the game of NanoBingo has a card 
containing four numbers from the set S = {1, 2, 3, 4, 5, 6}，arranged in two rows. An 
announcer calls out the elements of in random order; the first player whose card has 
a horizontal row with both numbers called shouts “Bingo!” and wins. (Or victory is 
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shared when there are multiple Bingoes.) For example, consider the four cards 



B 


2 

3 

4 

6 




If the announcer calls “6, 2, 5, 1” when A plays against then A wins; but the sequence 
“1 ， 3, 2” would yield a tie. One can show that Pr(A beats B) = ||| ， Pr(B beats A )= 
HI, and Pr(A and B tie)= 备 . Determine the probabilities of all possible outcomes 
when there are (a) two (b) three (c) four different players using those cards- 


► 5. [HM22] (T. ML Cover ， 1989.) Common wisdom asserts that longer games favor 
the stronger player, because they provide more evidence of the relative skills. 

However, consider an n-round game in which Alice scores Ah - VA n points while 

Bob scores + • • • + points, where each of Ai 厂 …， are independent random 
variables representing Alice’s strength, and each oi ^ B n independently represent 

Bob’s (and are independent of the A^s). Suppose Alice wins with probability P n . 

a) Show that it’s possible to have Pi = .99 but Piooo < .0001. 

b) Let m k = 2 k \ n k = 2 fc2+/c , and q k = 2~ k2 /D, where D = 2— 0 + 2 -1 + 2— 4 + 2— 9 + 
••• » 1.56447. Suppose A and B are zero except that A = mk with probability 
qk when A: > 0 is even, B = rrik with probability qk when A: > 1 is odd. What are 
Pr(A > J5), Pr(4 < B), and Pr(A = B)1 

c) With the distributions in (b)，prove that P nh [k even] as A: ^ oo. 

► 6. [M22] Consider n > 2 random Boolean (or binary) variables Xi ... X n with the 

following joint distribution: The vector xi . .. x n occurs with probability l/(n — l) 2 if 
xi + • • • + Xn = 2 ， with probability (n — 2) / (2n — 2) if xi + • • • + =0 ， and with 

probability 0 otherwise. Show that the variables are pairwise independent (that is, Xi 
is independent of Xj when i ^ j); but they are not A:-wise independent for k > 2. 

Also find a joint distribution, depending only on vx — xi + • • • + , that is A:-wise 

independent for k = 2 and k = 3 but not k = 4. 


7. [M30] (Ernst Schulte-Geers, 2012.) Generalizing exercise 6, construct a z^x-based 
distribution that has A:-wise but not (k + l)-wise independence, given k > 1. 

► 8 . [M20] Suppose the Boolean vector xi.. .x n occurs with probability (2 + {—l) ux )/ 

2 n+1 , where vx = xi ^ - + x n . For what k is this distribution A:-wise independent? 

9 . [M20] Find a distribution of Boolean vectors xi .. .x n such that any two variables 
are dependent; yet if we know the value of any Xj^ the remaining variables are (n — 1) - 
wise independent- Hint: The answer is so simple, you might feel hornswoggled. 

► 10. [M21] Let Yi, •…， Y m be independent and uniformly distributed elements of 

{0,1,... ,p — 1}, where p is prime. Also let Xj = (j m + Yi j m_1 H - + Y m ) modp, for 

1 < j < n- For what k are the X^s A:-wise independent? 

11* [M20] If ••” X 2 n are independent random variables with the same discrete 
distribution, and if a is any real number whatsoever, prove that 


Pr 


+ ••• + X‘2n 
2n 


a 


< 


+ • • • + X 、 


a 


n 


> 2 - 


Cover 

games 
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pairwise independent 
A；-wise independent 
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parity 

sideways sum 


12. [18] Which of the following four statements are equivalent to the statement that 
Pr(A I B) > Pr(A)? (i) Pr(S | A) > Pr ⑻； （ ii) Pr(A\B) > Pr(A\B); (iii) Pt(B \ A) > 
Pr^lA); (iv) Pr(A\B) > Pr(i|B). 

13 - [15] True or false: Pr(A\C) > Pr(A) if Pr(A|B) > Pr(A) and Pr(B\C) >Pr(B). 
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14. [10] (Thomas Bayes ， 1763-) Prove the “chain rule” for conditional probability: 

Pr(Ai fl ... fl A n ) = Pr(Ai) Pr(A‘ 2 1 Ai) ... Pr(A n | Ai fl... fl A n _i). 

15. [12] True or false: Pr(4 | B n (7) Pr(J5|C) = Pr(A DB\C). 

16. [Ml5] Under what circumstances is Pr(A \ B) = Pr(A U C \ B)1 

► 17. [15] Evaluate the conditional probability Pr( T is an ace | B = Q 奉 ） in the playing 
card example of the text, where T and B denote the top and bottom cards. 

18. [20] Let M and m be the maximum and minimum values of the random vari¬ 
able X. Prove that var(X) < (M — EX)(EX — m). 

► 19. [HM28] Let X be a random nonnegative integer, with Pr(X = x) = 1/2 X+1 ，and 
suppose that X = (... X 2 XiXo )2 and X + 1 = (… y^YiYoh in binary notation* 

a) What is EX n ? Hint: Express this number in the binary number system. 

b) Prove that the random variables {Xo,Xi ， … ，叉 n-i} are independent. 

c) Find the mean and variance of 5 = Xo + Xi + X 2 H - • 

d) Find the mean and variance of = Xo ㊉ 叉 1 ㊉ 叉 2 ㊉… • 

e) Let 7r = (H-poPiP 2 • • • ) 2 . What is the probability that X n = p n for all n > 0? 

f) What is Ey n ? Show that Yo and Yi are not independent. 

g) Find the mean and variance of T = Yo + H + I 2 + • • •. 

20. [Ml8] Let Xi, …， be binary random variables for which we know that 

Xj) = Ylj e j EXj for all J g {1，Prove that the X^s are independent. 

21. [M20] Find a small - as-possible example of random variables X and Y that satisfy 
covar(X,y) = 0, that is ， EXY = (EX)(EY)，although they aren’t independent. 

22. [M20] Use Eq. (8) to prove the “union inequality^ 

Pr(Ai U • • • U A n ) < Pr(Ai) + • • • + Pr(A n ). 


► 23. [M21] If each is an independent binary random variable with E 為 =p，the 
cumulative binomial distribution B m ^ n (p) is the probability that X\ + • • • + X n < m. 
Thus it’s easy to see that B m , n (p) = J2kLo (fc)p fc (l - p) n ~ k - 

Show that J5 m ， n (p) is also equal to ( n_m ^ 1+fc )p /e (l — p) n_m , for 0 < m < n. 

Hint: Consider the random variables Ji, J 2 , …， and T defined by the rule that Xj = 0 
if and only if j has one of the T values {Ji ， J 2 , … ， Jr}, where 1 < Ji < J 2 < • • • < 
Jt < n. What is Pr(T > r and J r = s)? 


24. [HM27] The cumulative binomial distribution also has many other properties- 

a) Prove that B m ^ n {p) = (n — m) (^) x m (1 — x) n ~ l ~ m dx^ for 0 < m < n. 


b) Use that formula to prove that > *|，for 0 < m < n/2. Hint: Show 

that f 0 m/n x m (l- x) n ~ 1 ~ m dx < Jl /n x m (l - x) n - 1 ~ m dx. 

c) Show furthermore that B m ^ n (m/n) > | when n/2 < m < n. [Thus m is the 
median value of X\ + • • • + X n , when p = m/n and m is an integer.] 


25. [M25] Suppose Xi, X‘ 2 , … are independent random binary variables, with means 


EX；, = pk - Let ((^)) be the probability that Xi +• - - + X n = k; thus ((^)) = p n (QlJ)) + 
^(C^ 1 )) = i zk ] (^ 1 • • • (Qn +Pnz), where 办 =1 -p/c- 

a) Prove that ((^)) > (( 二 )), if 巧 < (^ 4 - l)/(n + 1) for 1 < j < n. 

b) Furthermore ((^)) < { n k )p k q n ~ k ^ if Pj < p < k/n iov 1 < j < n. 

26. [M27] Continuing exercise 25, prove that ((^)) 2 > H))(1 + |) (1 + 占 ) 

for 0 < A: < n- Hint: Consider r n ^ = ((:))/ (^)- 
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27. [M22] Find an expression for the generalized cumulative binomial distribution 

((:)) that is analogous to the alternative formula in exercise 23- 

28. [HM28] (W. Hoeffding, 1956.) Let X = Xi + ... + X n and pi + ... + p n = np in 
exercise 25, and suppose that E g(X) = X^=o 夕 （众） ((=))f° r some function g. 

a) Prove that E g(X) < g(k) (:)P k (1 — p) n ~ k if 9 is convex in [0.. n]. 

b) If g isn’t convex, show that the maximum of E g(X)^ over all choices of {pi，•. •,p n } 
with pi + • • • + Pn = np can always be attained by a set of probabilities for which 
at most three distinct values { 0 , a, 1 } occur among the pj. 

c) Furthermore ^T=o (©) S B m ， n (p)，whenever pi +- V p n = np > m + 1 . 

29. [HM29] (S. M. Samuels, 1965.) Continuing exercise 28, prove that we have 

B m ,n(p) > ((1 - 1)/((1 - p)m + l)) n_m whenever np < m-h 1 . 

30. [HM34 ] Let Xi ， … ， X n be independent random variables whose values are non¬ 
negative integers, where EX/, = 1 for all and let p = Pr(Xi + • • • + X n < n). 

a) What is p, if each Xk takes only the values 0 and n + 1? 

b) Show that, in any set of distributions that minimize p, each assumes only two 
integer values, 0 and m/c, where 1 < rrik < n + 1 - 

c) Furthermore we have p > 1/e, if each Xk has the same two-valued distribution. 

► 31. [M20] Assume that Ai，• •. ， A n are random events such that, for every subset 

I C {1 ， •… ， n}，the probability Pr(p| iG/ Ai) that all Ai for i G / occur simultaneously 
is 7 T /； here m is a number with 0 < 7 r/ < 1, and 丌 0 = !_• Show that the probability of 
any combination of the events, Pr(/([Ai] ， … ， [A n ])) for any Boolean function /， can be 
found by expanding /’s multilinear reliability polynomial /([Ai]， ••” [A n ]) and replac¬ 
ing each term by tti• For example, the reliability polynomial of a: 1 ®X 2 ®a ：3 is 

xi + X 2 + X 3 — 2xiX2 — 2xiX3 — 2x2X3 + 4xiX2X3] hence Pr([Ai] ㊉ [A‘ 2 ] ㊉ [A 3 ])= 

丌 1 + 丌 ‘2 + 丌 3 — 2 丌 12 — 2 丌 13 — 2 丌 23 + 4 丌 i 23 . (Here c 7 Ti 2 5 is short for etc.) 

32. [M21] Not all sets of numbers 7 r/ in the preceding exercise can arise in an actual 
probability distribution. For example, if / C J we must have 7Ti > 7rj. What is a 
necessary and sufficient condition for the 2 n values of 7 r/ to be legitimate? 

33• [M20] Suppose X and Y are binary random variables whose joint distribution is 
defined by the probability generating function G{w^ z) = E(w x z Y ) = pw + qz + rwz^ 
where p ， g，r > 0 and p + ^ + r = 1 - Use the definitions in the text to compute the 
probability generating function E( 2 ： e ( x I y )) for the conditional expectation E(X \ Y). 

34. [Ml 7] Write out an algebraic proof of ( 12 )，using the definitions ( 7 ) and ( 13 ). 

► 35. [M22] True or false: (a) E(E(X|y)| y)=E(X|Y); (b) E(E(X | Y) \ Z) = E(X\Z). 
36. [M21] Simplify the formulas (a) E(/(X)|X); (b) E(/(F) E(p(X)|y)). 

► 37. [M20] Suppose X\ ... X n is a random permutation of {1, ••” n}, with every per¬ 
mutation occurring with probability 1/n!. What is E (為 | 又 1 ，••” 為 -i)? 

38. [M26] Let X\ ... X n be a random restricted growth string of length n，each with 
probability \ jw n (see Section 7.2.1.5). What is E(Xk |Xi ， … ，為 _i)? 

► 39. [HM21 ] A hen lays N eggs, where Pr(7V = n) = e _M /x n /n! obeys the Poisson 
distribution. Each egg hatches with probability independent of all other eggs. Let 
K be the resulting number of chicks. Express (a) E(K|iV) ， (b) E and (c) E(Nj K) 
in terms of N, K, and p. 

40. [Ml6] Suppose X is a random variable with X < M, and let m be any value with 
m < M. Show that Pr(X > m) > (EX — m)/(M — m). 
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41. [HM21 ] Which of the following functions are convex in the set of all real num¬ 
bers x? (a) \x\ a ^ where a is a constant; (b) ^2 k>n x k /k\^ where n > 0 is an integer; 
(c) e e，£c| ; (d) f(x)[x G I] + oo[x ^ where / is convex in the interval I. 

42. [HM21 ] Prove Jensen’s inequality ( 20 ). 

► 43. [Ml8] Use ( 12 ) and ( 20 ) to strengthen ( 20 ): If f is convex in I and if the random 
variable X takes values in I, then /(EX) < E(f(E(X \ Y))) < E(/(X)). 

► 44. [M25] If / is convex on the real line and if EX = 0, prove that E f(aX) < E f(bX) 
whenever 0 < a < 6 . 

45. [Ml8] Derive the first moment principle ( 21 ) from Markov’s inequality ( 15 ) - 

46. [Ml5] Explain why E(X 2 \X > 0) > (E(X|X 〉 0 )) 2 in ( 23 ). 

47. [Ml 5] If X is random and Y = max(0,X), show that EY > EX and EF 2 < EX 2 

► 48. [M20] Suppose X\^ … ， X n are independent random variables with EX k = 0 and 

E X 篇 =al for 1 < k < n. Chebyshev’s inequality tells us that Pr(|Xi+- - .+X n | > a) < 
((Ji + … + (j=)/a 2 ; show that the second moment principle gives a somewhat better 
one-sided estimate, Pr(Xi +• • • +X n > a) < + . (a 2 + • • • +o^)，if a > 0. 

49. [M20] If X is random and > 0, prove that Pr(X = 0) < (EX 2 )/(EX ) 2 - 1. 

► 50. [M27] Let X = Xi + • • • + X m be the sum of binary random variables, with 
EXj = pj. Let J be independent of the X’s, and uniformly distributed in {1, • • •, m}. 

a) Prove that Pr(X > 0) = Y ， T=i E ( X J/ X I X J > 0 ) . Pr ( x i > 0 ). 

b) Therefore ( 24 ) holds. Hint: Use Jensen^s inequality with f(x) = 1/x. 

c) What are Pr(Xj = 1) and Pr(J = j | Xj = l)? 

d) Let tj = E(X \J = j and Xj = 1). Prove that EX 2 = Pjtj. 

e) Jensen’s inequality now implies that the right side of ( 24 ) is > (EX) 2 /(EX 2 ). 

► 51. [M21] Show how to use the conditional expectation inequality ( 24 ) to obtain also 
an upper bound on the value of a reliability polynomial, and apply your method to the 
case illustrated in ( 25 ). 

52. [M21] What lower bound does inequality ( 24 ) give for the reliability polynomial 

of the symmetric function S>k(xi^ •. •, x n )^ when pi = = p n = p? 

53. [M20] Use ( 24 ) to obtain a lower bound for the reliability polynomial of the non¬ 
monotonic Boolean function ^xq) = xiX 2 Xs V X 2 X 3 X 4 V • • • V X 5 X 6 X 1 V xqX\X 2 ^ 

► 54. [M22] Suppose each edge of a random graph on the vertices {1 ， … ， n} is present 
with probability p, independent of every other edge. If w are distinct vertices ， 
let X uvw be the probability that {u^ w} is a 3-clique, namely the probability that 
u —— u —— w, and v —— w. Also let X = ^2i <u<v<w<n X uvw be the total number of 
3 - cliques. Use the (a) first and (b) second moment principle to derive bounds on the 
probability that the graph contains at least one 3-clique. 

55. [23] Evaluate the upper and lower bounds in the previous exercise numerically 
in the case n = 10 ， and compare them to the true probability, when (a) p = 1 / 2 ; 
(b) p = l/10. 

56. [HM20] Evaluate the upper and lower bounds of exercise 54 asymptotically when 
p = 入 /n and n 00 . 

► 57. [M21] Obtain a lower bound for the probability in exercise 54(b) by using the 
conditional expectation inequality ( 24 ) instead of the second moment principle ( 22 ). 
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58. [M22] Generalizing exercise 54, find bounds on the probability that a random 
graph on n vertices has a 众 -dique，when each edge has probability p. 

► 59. [HM25] (The four functions theorem.) The purpose of this exercise is to prove an 
inequality that applies to four sequences 〈 a n 〉，〈6 n 〉， (c n ), (d n ) of nonnegative numbers: 


ajbk < Cj\kdjg^k for 0 < j ， A: < oo implies 


hk 

j=0 k=0 j=0 k=0 



(The sums will be oo if they don’t converge.) Although the inequality might appear at 
first to be merely a curiosity, of interest only to a few lovers of esoteric formulas，we 
shall see that it’s a fundamental result with many applications of great importance, 

a) Prove the special case where aj = bj = Cj = dj = 0 for j > 2, namely that 

aobo < codo^ aobi < cido ， aibo < cido, and aibi < adi 

implies (ao + ai) (6o + 6 i) < (co + ci)(do + di). 


Can equality hold in the first four relations but not in the last one? Can equality 
hold in the last relation but not in the first four? 

b) Use that result to prove (*) when aj = bj = Cj = dj = 0 for all j > 2 n ， given n > 0. 

c) Conclude that (*) is true in general- 

► 60. [M21] If J 7 is a family of sets，and if a is a function that maps sets into real 

numbers, let a^J 7 ) = Suppose T and Q are finite families of sets for which 

nonnegative set functions a，/?, 7 , and S have been defined with the property that 

a(S) 0(T) < 7(5 UT) S(SDT) for all 5 G 7* and T eQ. 

a) Use exercise 59 to prove that a(J r )/3(Q) < U Q)8{J : n Q). 

b) In particular, IJ 7 ! | 6 | < {J 7 L) | J 7 n Q\ for all families T and Q. 

► 61. [M28] Consider random sets in which S occurs with probability /x ⑹， where 

/i(S) > 0 and /i(S) //(T) < /i(S U T) "(S fl T) for all sets S and T. (**) 

Assume also that U = U M (s)>o ^ 1S ^ finite set. 

a) Prove the FKG inequality (which is named for C. ML Fortuin，R W- Kasteleyn, 
and J- Ginibre): If / and g are real-valued set functions，then 

f(S) < f(T) and g(S) < g(T) for all 5 C T implies E(fg) > E(f) E(g). 

Here, as usual ， E(/) stands for f (S) • The conclusion can also be written 

c covar (/ 5 g) > 0 ^ using the notation of ( 9 ); we say that f and g are “positively 
correlated” when this is true. (The awkward term “nonnegatively correlated” 
would be more accurate, because / and g might actually be independent.) Hint: 
Prove the result first in the special case that both / and g are nonnegative. 

b) Furthermore, 

f(S) > f(T) and g(S) > g(T) for all 5 C T implies E(fg) > E(/) E ⑹; 

f(S) < f(T) and g(S) > g(T) for all 5 C T implies E(fg) < E(/) E(g). 

c) It isn’t necessary to verify condition (**) for all sets, if (**) is known to hold 

for sufficiently many pairs of “neighboring” sets- Given let’s say that set S is 

supported if fi(S) ^ 0. Prove that (**) holds for all S and T whenever the following 
three conditions are satisfied: (i) If S and T are supported, so are SUT and /SflT. 
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(ii) If S and T are supported and S C the elements of T \ S can be labeled 
ti,tk such that each of the intermediate sets S U {ti^... ^tj} is supported, for 
l<j<k. (iii) Condition (**) holds whenever S = RUs and T = RUt and t ^ R. 

d) The multivariate Bernoulli distribution B • ， p m ) on subsets of {1, ••” m} is 

Ks) = 關)， 

j=i i=i 

given 0 < pi^... ^p m < 1- (Thus each element j is included independently with 
probability pj, as in exercise 25.) Show that this distribution satisfies (**)• 

e) Describe other simple distributions for which (**) holds. 

► 62. [M20] Suppose the m = (g) edges E of a random graph G on n vertices are 
chosen with the Bernoulli distribution ••” p m ). Let f(E) = [G is connected] and 
g{E) = [G is 4-colorable]. Prove that / is negatively correlated with g. 

63. [Ml7] Suppose Zo and Z\ are random ternary variables with Pr(Zo = a and 
Zi = b) = pab for 0 $ a,6 < 2， where poo + poi + ••• + P 22 = 1- What can you say 
about those probabilities p a b when E(Zi | Zo) = ZqI 

► 64. [M22] (a) If E(Z n +i | Z n ) = Z n for all n > 0, is (Z n ) a martingale? (b) If (Z n ) is 
a martingale, is E(Z n +i | Z n ) = Z n for all n > 0? 

65. [M21] If (Z n ) is any martingale, show that any subsequence 〈 Z m ( n )〉is also a 
martingale, where the nonnegative integers (m(n)) satisfy m(0) < m(l) < m(2) < • • • • 

► 66. [M22] Find all martingales Zo, Zi， … such that each random variable Z n assumes 
only the values 士 n. 

67. [M20] The Equitable Bank of El Dorado features a money machine such that, if 
you insert k dollars, you receive 2k dollars back with probability exactly 1/2; otherwise 
you get nothing. Thus you either gain $k or lose $/c，and your expected profit is $0. 
(Of course these transactions are all done electronically.) 

a) Consider ， however, the following scheme: Insert $1; if that loses, insert $2; if that 
also loses, insert $4; then $8 ， etc. If you first succeed after inserting 2 n dollars, 
stop (and take the 2 n+1 dollars) - What’s your expected net profit at the end? 

b) Continuing (a) ， what’s the expected total amount that you put into the machine? 

c) If Z n is your net profit after n trials，show that (Z n ) is a martingale. 

68 . [HM23] When J* H. Quick (a student) visited El Dorado, he decided to proceed 
by making repeated bets of $1 each, and to stop when he first came out ahead. (He was 
in no hurry, and was well aware of the perils of the high-stakes strategy in exercise 67.) 

a) What martingale (Z n ) corresponds to this more conservative strategy? 

b) Let N be the number of bets that Quick made before stopping. What is the 
probability that N = nl 

c) What is the probability that TV > n? 

d) What is E N1 

e) What is the probability that min(Zo, Zi ， …） =—ml (Possible “gambler’s ruin .”） 

f) What is the expected number of indices n such that Z n = —m，given m > 0? 

69. [M20] Section 1.2.5 discusses two basic ways by which we can go from permuta¬ 
tions of {1, ••” n — 1} to permutations of {1, • •., n}: “Method 1” inserts n among the 
previous elements in all possible ways; “Method 2” puts a number k from 1 to n in the 
final position, and adds 1 to each previous number that was > k. 
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Show that, using either method, every permutation can be associated with a node 
of Fig. 1， using a rule that obeys the probability assumptions of Polyaks urn model. 

70 • [M25] If Polyaks urn model is generalized so that we start with c balls of different 
colors, is there a martingale that generalizes Fig. 1? 

71. [M21] (G- Polya.) What is the probability of going from node (r ，6 ) to node (r ’，6 ’） 
in Fig. 1, given r, r ’，6 , and b f with r f > r and b f > b? 

72. [M21] Let X n be the red-ball indicator for Polyaks urn, as discussed in the text. 
What is E(X ni X U2 ... X Um ) when 0 < m < n ‘2 < ••• < n m ? 

73. [M24] The ratio Z n = r/(n + 2) at node (r，n + 2 — r) of Fig. 1 is not the only mar¬ 
tingale definable on Polyaks urn. For example, r[n = r — 1] is another; so is r( n ^' 1 ) /2 n . 

Find the most general martingale (Z n ) for this model: Given any sequence ao, ai ， 
•…， show that there’s exactly one suitable function Z n = /(r ， n) such that a/c- 

74. [M20] (Bernard Friedman’s urn.) Instead of contributing a ball of the same color ， 
as in Fig. 1, suppose we use the opposite color. Then the process changes to 


Level 0 
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Level 3 
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and the probabilities of reaching each node become quite different. What are they? 

75. [M25] Find an interesting martingale for Bernard Friedman’s urn. 

76. [M20] If (Z n ) and {Z f n ) are martingales，is (Z n + a martingale? 

77. [M21] Prove or disprove: If (Z n ) is a martingale with respect to (X n )^ then (Z n ) 

is a martingale with respect to itself (that is，a martingale) - 

78. [M20] A sequence of random variables (V n ) for which E(T4+i | Vo, … , V n ) = 1 
is called a multiplicatively fair •” Show that Z n = VbVi • • • Vn is a martingale in such 
a case. Conversely, does every martingale lead to a multiplicatively fair sequence? 

79. [M20] (De Moivre’s martingale.) Let • be a sequence of independent 

coin tosses，with Pr([ “heads” occurred on the nth toss]) = Pr(X n = 1) = p for each n. 
Show that Z n = + 打 ） defines a martingale, where q = 1 — p. 

80. [M20] Are the following statements true or false for every fair sequence (Y n )? 

(a) E(YiY 5 ) = 0. (b) E(y 3 y 5 2 ) = 0. (c) E(y ni y n2 ... Y n J = O if m < n 2 < ... < n m . 

81. [M21] Suppose E(X n +i | X。， … , X n ) = X n + X n -i for n > 0, where = 0. 

Find sequences a n and b n of coefficients so that Z n = a n X n is a martingale, 

where Zq = Xo and Z\ = 2Xo — X\. (We might call this a “Fibonacci martingale .”） 

► 82• [M20] In the game of Ace Now, let X n = [the nth card is an ace], with X。 = 0. 

a) Show that Z n = (4 — Xi — • • • — X n )/(52 — n) satisfies ( 28 ) for 0 < n < 52. 

b) Consequently EZn = 1/13, regardless of the stopping rule employed. 

c) Hence all strategies are equally good (or bad); you win $0 on average. 
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► 83 . [HM22] Given a sequence (X n ) of independent and nonnegative random variables ， 

let S n = + • • • + X n . If N n (xo, …， x n -i) is any stopping rule and if N is defined 

by ( 31 )，prove that ESn = E E 叉 (In particular, if EX n =EXi for all n > 0 
we have “WakTs equation，” which states that ESn = (E7V)(EXi).) 

84 . [HM21 ] Let f(x) be a convex function for a < x < and assume that (Z n ) is a 
martingale such that a < Z n < b for all n > 0 - (Possibly a = —00 and/or b = +oo.) 

a) Prove that (f(Z n )) is a submartingale. 

b) What can you say if the sequence (Z n ) is assumed only to be a submartingale? 

85 . [M20] Suppose there are R n red balls and B n black balls at level n of Polyaks urn 
(Fig. 1). Prove that the sequence (R n /B n ) is a submartingale- 

► 86 . [M22] Prove ( 33 ) by inventing a suitable stopping rule iV n +i(Zo , …， Z n ). 

87 . [Ml7] What does the maximal inequality ( 33 ) reveal about the chances that 
Polyaks urn will hold thrice as many red balls as black balls at some point? 

► 88 . [HM30] Let S = supZ n be the least upper bound of Z n as n —> 00 in Fig. 1, 

a) Prove that S > 1/2 with probability In 2 ^ .693. 

b) Similarly，show that Pr(S > 2/3) = In 3 — 7r/\/27 ^ -494. 

c) Generalize to Pr(S > (t — 1)/ 尤 )， for all t > 2. Hint: See exercise 7-2.1.6-36. 

89 . [Ml6] Let (Xi, • • • ， X n ) be random variables that have the Bernoulli distribution 

B(pi, … ,p n ). Use ( 37 ) to show that Pr(Xi + ••• + > pi + +p n +x) < e~ 2x2 / n . 

90 . [HM25] The Hoeffding-Azuma inequality ( 37 ) can be derived as follows: 

a) Show first that Pr(Fi + ... + > x) < — for all ^ > 0. 

b) If 0 < p < 1 and q = 1 — p, show that e yt < e ’ ⑴ + ye 9 ^ when —p < y < q and 
尤 > 0, where f(t) = —pt + ln(g + pe 1 ) and g(t) = —pt + 1 x 1 ( 6 ^ — 1). 

c) Prove that f(t) < t 2 /8. Hint: Use Taylor’s formula ， Eq. 1.2.11.3—( 5 ). 

d) Consequently a <Y < b implies e Yt < e ( 6_a ) 2i2 / 8 + Y" ⑷， for some function h(t). 

e) Let c = (cf + … +c^)/2, where Ck = bk — ctk- Prove that E(e( Yl +". +Yn )0 $ e c ’ 2 / 4 . 

f) We obtain ( 37 ) by choosing the best value of t. 

91 . [M20] Prove that Doob’s general formula ( 39 ) always defines a martingale- 

► 92 . [M20] Let (Q n ) be the Doob martingale that corresponds to Polyaks urn ( 27 ) 
when Q = X m , for some fixed m > 0. Calculate Qo, Qi ， Q 2 , etc. 

93 . [M20] Solve the text’s hashing problem under the more general model considered 
in the bin-packing problem: Each variable X n has probability p n k of being equal to 
for 1 < n < t and 1 < A: < m- What formula do you get instead of ( 44 )? 

► 94 . [M22] Where is the fact that the variables {Xi^... ^Xt} are independent used in 
the previous exercise? 

95. [M20] True or false: “P(51ya’s urn q.s. accumulates more than 100 red balls •” 

96 . [HM22] Let X be the number of heads seen in n flips of an unbiased coin. Decide 
whether each of the following statements about X is a.s., q.s. ， or neither, as n 00 : 

(i) \X — n/2\ < y/n\nn] (ii) \X — n/2\ < a/ nInn; 

(iii) \X — n/2\ < Vnlnlnn; (iv) \X — n/2\ < y/n. 

► 97 . [HM21 ] Suppose [_n 1+<5 」items are hashed into n bins, where ^ is a positive 
constant- Prove that every bin q.s. gets between ^n s and 2n s of them. 
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► 98. [M21] Many algorithms are governed by a loop of the form 

X — n; while X > 0, set X — X _ F(X) 

where F(X) is a random integer in the range [1. - X]. We assume that each integer 
F(X) is completely independent of any previously generated values, subject only to 
the requirement that EF(j) > 仍 ， where 0 < ^1 < g 2 < - < g n - 

Prove that the loop sets X ^ X — F(X) at most l/^i +1/^2 H - hl/^ n times, on 

the average. (“If one step reduces by g n , then perhaps (l/g n )th of a step reduces by 1.”) 

99. [HM30] Show that the result in the previous exercise holds even when the range 

of F(X) is (—oo • • X], given 0 < pi < • • • < < g n +i < • • • • (Thus X might increase.) 

100. [HM17] A certain randomized algorithm takes T steps, where Pr(T = t) = pt for 
1 < t < oo. Prove that (a) lim m _^oo Emin(m ， T) = ET; (b) ET < oo implies Poo = 0. 

101. [HM22] Suppose X = Xi + • • • + X m is the sum of independent geometrically 

distributed random integers，with Pr(X/, = n) = p ； c(l — Pk) n ~ l for n > 1. Prove that 
Pr(X > rfi) < re l ~ r for all r > 1, where fi = EX = 1/pk- 

102. [M20] Cora collects coupons, using a random process- After already owning 
k — 1 of them, her chance of success when trying for the kth is at least one chance 
in Sk, independent of any previous successes or failures. Prove that she will a.s. own 
m coupons before making (& + ••• + s m ) In n trials. And she will q.s. need at most 
Sk Inn In Inn trials to obtain the kth coupon, for each k < m, if m = O(n 100 °). 

► 103. [M30] This exercise is based on two functions of the ternary digits {0, 1 ， 2}: 

fo(x) = max(0, x — 1); fi(x) = min(2, x + 1)- 

a) What is Pr(fx 1 (fx 2 (- - - (fx n (i)) •••))= j)，for each i，j G {0, 1 ， 2 }，assuming that 

X 2 , ••” X n are independent, uniformly random bits? 

b) Here’s an algorithm that computes fx 1 (/x 2 ( … (fx n (i)) … )）for i G {0, 1 ， 2}，and 
stops when all three values have coalesced to a common value: 

Set aoaia 2 012 and n *<— 0. Then while ao ^ a‘ 2 , set n n + 1, 

t 0 tit 2 i- (X n ? 122: 001)，and QjOQj\cl 2 人 ~ dt\ • Output clq. 

(Notice that ao < ai < a 2 always holds-) What is the probability that this 
algorithm outputs j? What are the mean and variance of iV, the final value of n? 

c) A similar algorithm computes fx n (… (fx 2 (fx 1 (i))) • • •)，if we change i at 0 at 1 at 2 ^ 
to c t ao t ai t a2 ，. "What’s the probability of output j in this algorithm? 

d) Why on earth are the results of (b) and (c) so different? 

e) The algorithm in (c) doesn’t really use a\. Therefore we might try to speed 
up process (b) by cleverly evaluating the functions in the opposite direction. 
Consider the following subroutine, called sub(T): 

Set aoa ‘2 02 and n 0. Then while n < T set n <— X random 

bit，and a 0 a 2 (X n ? /i(a 0 )/i(a 2 )： /o(ao)/o(« 2 )) - If ao = a 2 output a 0 ， 
otherwise output —1. 

Then the algorithm of (b) would seem to be equivalent to 

Set T <— 1^ a < - 1; while a < 0 set T <— 2T and a <— sub(T); output a. 

Prove, however, that this fails- (Randomized algorithms can be quite delicate!) 

f) Patch the algorithm of (e) and obtain a correct alternative to (b). 

104. [M21] Solve exercise 103(b) and 103(c) when each Xk is 1 with probability p. 


loop 

analysis of algorithms 

randomized algorithm 

geometrically distributed 

tail inequalities 

Larrie, Cora Mae 

a.s. 

q.s. 

monus 

saturating addition/subtractio 
coalescing random walk 
analysis of algorithms 
forward versus backward 
backward versus forward 
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► 105. [M30] {Random walk on an n-cycle.) Given integers a and n，with 0 < a < n, 
let N be minimum such that (a+ (—l) Xl + (—1) X2 + • • • + (—l) XiV ) mod n = 0, where 
Xi 5 I‘ 2 , • • • is a sequence of independent random bits. Find the generating function 
g a = S:。Pr(iV = k) z k • What are the mean and variance of N? 

106. [M25] Consider the algorithm of exercise 103(b) when the digits are d-ary instead 
of ternary; thus fo(x) = max(0 5 x — 1) and fi(x) = min(d — l，x + 1). Find the 
generating function, mean, and variance of the number N of steps required before 
ao = ai = • • • = ad-i is first reached in this more general situation* 

► 107. [M22] (Coupling.) If X is a random variable on the probability space and 
Y is another random variable on another probability space D’’，we can study them 
together by redefining them on a common probability space Q. All conclusions about 
X or Y are valid with respect to provided that we have Pr(X = x) = Pr (X = x) 
and Pr(y = y) = Pr f, (Y = y) for all x and y. 

Such “coupling” is obviously possible if we let Q be the set x Q" of pairs 
{uj'uj" I a; G O 7 and uj" G Q"}, and if we define Pr(o/o;") = Pr 7 {uj') Pr"(a;") for each 
pair of events. But coupling can also be achieved in many other ways. 

For example，suppose O 7 and Q! r each contain only two events ， {Q, K} and {♦ ， 秦}， 
with Pr (Q) = p，Pr (K) = 1 — p ， Pr" (Jk) =q, Pr /7 (4b) = 1 — q. We could couple them 
with a four-event space = {Qjk ， 辜， K‘}，having Pr(Qjk) = pq^ Pr(Q‘）= p(l — q), 
Pr(K 辜 ）=(1 — p)q, Pr(K‘）= (1 —p)(l — q). But if p < ^ we could also get by with just 
three events，letting Pr(Q 辠 ) =p ， Pr(Kjk) = q — p, Pr(K 秦 ) =1 — q. A similar scheme 
works when p > omitting K 辜 . And if p = ^ we need only two events ，QJk and 

a) Show that if and Q!’ each have just three events, with respective probabilities 
{pi,p 2 ,p 3 } and { 91 , 92 , 93 }，they can always be coupled in a five - event space Q. 

b) Also, four events suffice if {pi,P 2 ,P 3 } = { 忐，吾，告 } ，切 i ， 的，卵 } = { 吾，垚，長 }. 

c) But some three-event distributions cannot be coupled with fewer than five. 

108. [HM21 ] If X and Y are integer-valued random variables such that Pr 7 (X > n) < 
Pr /7 (y > n) for all integers n, find a way to couple them so that X <Y always holds. 

109. [M27] Suppose X and Y have values in a finite partially ordered set P, and that 


Pr ’ (X 匕 a for some a ^ A) < Pr ff (Y y a for some a G -A), for all A C P. 


We will show that there’s a coupling in which X always holds. 

a) Write out exactly what needs to be proved, in the simple case where P = {1， 2, 3} 

and the partial order has 1 3, 2 - < 3. (Let pk = Pr 7 (X = h) and qk = Pr 7/ (Y = h) 

for k ^ P. When P = {1,. •., n } 5 a coupling is an n x n matrix (pij) of nonnegative 
probabilities whose row sums are ^j P^j = Vi and column sums are = Qj •) 

Compare this to the result proved in the preceding exercise. 

b) Prove that Pr 7 (X ^ b for some 6 G 5) > Pr /7 (Y^b for some 6 G 5), for all BCP. 

c) A coupling between n pairs of events can be viewed as a flow in a network that 

has 2 n + 2 vertices {s, xi” •” x n ^ yi” • ” y n ,t}, where there are pi units of flow 
from s to Xi^ pij units of flow from to yj, and qj units of flow from yj to t. The 
“max-flow min-cut theorem” [see Section 7.5.3] states that such a flow is possible 
if and only if there are no subsets J C {1， …， n} such that (i) every path from 
s to t goes through some arc s — >Xi for i G / or some arc yj ― >t for j G and 
(ii) ^2 ieI Pi + qj < Use that theorem to prove the desired result- 


110. [M25] IfX and Y take values in {1， …， n}，let 抑 =Pr (X = k)^ qk = (Y = k)^ 
and r/c = min(pk^qk) for 1 < A: < n- The probability that X = Y in any coupling is 
obviously at most r = 


Random walk 
72-cycle 

generating function 
generating function 
analysis of algs 
Coupling 

partially ordered set 
row sums 
column sums 
flow in a network 
max-flow min-cut theorem 
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a) Show that there always is a coupling with Pr(X = Y) = r. 

b) Can the result of the previous exercise be extended, so that we have not only 
Pr(X ^ y) = 1 but also Pr(X =Y) =r? 

► 111. [M20] A family of N permutations of the numbers { 1 ,... ,n} is called minwise 
independent if, whenever 1 < j < A: < n and {ai，• • •, a/c} C { 1 , • • • ， n}, exactly N/k of 
the permutations 7r have min(ai7r， •…， a^) = aj- 

For example, the family F of iV = 60 permutations obtained by cyclic shifts of 


minwise independent 
sketch 

least common multiple 

combinatorial nullstellensatz 

Nullstellensatz, combinatorial 

polynomial 

degree 

grid 

covered 

diagonal lines 


123456,126345,152346,152634,164235,154263,165324,164523, 156342,165432 


can be shown to be minwise independent permutations of {1,2, 3,4, 5, 6 }. 

a) Verify the independence condition for F in the case A: = 3,ai = 1, a ‘2 = 3 ， a 3 = 4. 

b) Suppose we choose a random 7 r from a minwise independent family, and assign 
the “sketch” Sa = min ae A an to every A C {!■,•••, n}. Prove that, if A and B 
are arbitrary subsets ， Pr( 6 U = Sb) = |A fl / \A U B\. 

c) Given three subsets A ， (7, what is Pt(Sa = Sb = Sc 、? 

112. [M25] The size of a family F of minwise independent permutations must be a 
multiple of k for each k < n, hy definition. In this exercise we’ll see how to construct 
such a family with the minimum possible size，namely N = lcm(l, 2, … ， n). 

The basic idea is that, if all elements of the permutations in F that exceed m are 
replaced by oo, the “truncated” family is still minwise independent in the sense that，if 
min ae7r air = oo, we can imagine that the minimum occurs at a random element of A. 
(This can happen only if 7 r takes all elements of A to oo.) 

a) Conversely, show that an m-truncated family can be lifted to an (m+ 1 )-truncated 
family if，for each subset B of size n — m, we insert m + 1 equally often into each 
of B^s n — m positions, within the permutations whose oo’s are in B. 

b) Use this principle to construct minimum-size families F. 

113. [M25] Although minwise permutations are defined only in terms of the mini¬ 
mum operation, a minwise independent family actually turns out to be also maxwise 
independent — and even more is true! 

a) Let E be the event that chtt < k, biv = k, and CjTC > for any disjoint sets 
{ai” • •,a/}, {6}， {ci” • • ,c r } C {1, … ， n}. Prove that，if 丌 is chosen randomly 
from a minwise independent set ， Pr(E) is the same as the probability that E 
occurs when 7 r is chosen randomly from the set of all permutations. (For example, 
Pr( 57 r< 7 , 2 兀 = 7,1 兀 >7, 8 兀 >7) = 6 (n — 7) (n — 8 ) (n — 4)!/n!, whenever n > 8 .) 

b) Furthermore, if {ai，• • •, ak] G {1, ••” the probability that aj is the rth largest 
element of {ai 7 r ， … ， a/c 7 r} is Ijk, whenever 1 < j^r < k. 

► 114. [M28] (The u combinatorial nullstellensatz.^) Let f(xi^ . •., x n ) be a polynomial 

in which the coefficient of - . - x^ 71 is nonzero and each term has degree < dH - hd n . 

Given subsets Si^ ^ S n of the field of coefficients, with \Sj\ > dj for 1 < j choose 
Xi, … ， X n independently and uniformly, with each Xj G Sj. Prove that 


Hint: See exercise 4.6.1-16. 


Si + • • • + S n — {di + • • • + d n + n) + 1 

... Sn 


115. [M21] Prove that an m x n grid cannot be fully covered by p horizontal lines, 
q vertical lines ， r diagonal lines of slope +1 ， and r diagonal lines of slope —1，if 
m = p + 2Lr/2」+l and n = q + 2|"r/2] + 1. Hint: Apply exercise 114 to a suitable 
polynomial 
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116. [HM25] Use exercise 114 to prove that, if p is prime，any multigraph G on n 
vertices with more than (p — l)n edges contains a nonempty subgraph in which the 
degree of every vertex is a multiple of p. (In particular, if each vertex of G has fewer 
than 2p neighbors, G contains a p-regular subgraph. A loop from v to itself adds two 
to v’s degree.) Hint: Let the polynomial contain a variable x e for each edge e of G. 

► 117. [HM25] Let X have the binomial distribution B n (p), so that Pr(X = k)= 
{^)p k (l — p) n ~ k for 0 < A: < n. Prove that X mod m is approximately uniform: 

Pr(X mod m = r ) —— — 

m 

118. [M20] Prove that the second moment principle implies the Paley-Zygmund in¬ 
equality 

Pr(X >x)> ■ — ifO<x<EX. 

119. [HM24] Let x be a fixed value in [0. - 1]. Prove that，if we independently and 
uniformly choose U G [0.. x], V G [x. .1]^ W G [0 • • 1]，then the median (UVW) is 
uniformly distributed in [min(f/, V, W ).. max(/7, V, W)]. 

120. [M20] Consider random binary search trees T n obtained by successively inserting 

independent uniform deviates lh, lh , … into an initially empty tree. Let T n k be the 
number of external nodes on level and define T n (z) = T n kZ k j (n+1)- Prove that 

Z n = T n (z)/g n +i {z) is a martingale, where g n {z) = (2z + n — 2) (2z + n — 3) • • • {2z)/n\ 
is the generating function for the cost of the nth insertion (exercise 6.2.2-6). 

► 121. [M25] Let X and Y be random variables with the distributions Pr(X = t) = x(t) 
and Pr(y = t) = y(t). The ratio p(t) = y{t )which may be infinity，is called the 
probability density of Y with respect to X. We define the relative entropy of X with 
respect to Y, also called the Kullback-Leibler divergence of Y from by the formulas 

D(y\\x) = E(p(X)\gp(X)) = Elgp(Y) = ^2y(t)lg^ 

with OlgO and 01g(0/0) understood to mean 0- It can be viewed intuitively as the 
number of bits of information that are lost when X is used to approximate Y. 

a) Suppose X is a random six-sided die with the uniform distribution，but Y is 
a “loaded” die in which Pr{Y = 0) = | and Pr(Y = U3)= 吾 ， instead of 
Compute D(y\\x) and D(x\\y). 

b) Prove that D{y\\x) > 0. When is it zero? 

c) If p = Pr(X G T) and q = Pr(Y G T), show that E(\g p(Y) \Y e T) > \g(q/p). 

d) Suppose x(t) = 1 /m for all t in an m - element set and y{t) ^ 0 only when t ^ S. 
Express D{y\\x) in terms of the entropy Hy = Elg(l/y) (see Eq. 6-2.2-(i8)). 

e) Let Z(u^v) = Pr(X = usmdY = v) when X and Y have any joint distribution ， 
and let W v) be that same probability under the assumption that X and Y are 
independent- The joint entropy Hx,y is defined to be Hz, and the mutual infor¬ 
mation Ix,y is defined to be D(z\\w). Prove that Hw = Hx + Hy and Ix,y = 
Hw — Hz. (Consequently Hx,y < Hx + and Ix,y measures the difference.) 

122. [HM24] Continuing exercise 121， compute D(y\\x) and D{x\\y) when 

a) x(t) = l/2 t+1 and y(t) = 3 t /4 t+1 for t = 0^ 1, 2 , …； 

b) x(t) = e~ np (npY/t\ and y(t) = (?)p’（l — p) n ~ t , for t > 0 and 0 < p < 1. (Give 
asymptotic answers with absolute error 0(l/n), for fixed p as n ^ oo.) 


- 0。 < 爪. 
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► 123. [M20] Let X and Y be as in exercise 121. The random variable Z = A? Y: X 
either has the distribution x(t) or y(t), but we don’t know whether A is true or false. If 
we believe that the hypothesis Z = Y holds with the a priori probability Pr(A) = pk, 
we assume that Zk(t) = Pr/c(Z = t) = pkx{t) + (1 — Pk)y(t)- But after seeing a 
new value of say Z = Zk, we will believe the hypothesis with the a posteriori 
probability Pk+i = Pr(A | Z^). Show that D(y\\x) is the expected “information gained,” 
lg(p/c+i/(l — p/c+i)) — lg(p/c/(l — p/c))，averaged with respect to the distribution of Y. 

124. [HM22] (Importance sampling.) In the setting of exercise 121， we have E f(Y)= 
E(p(X)/(X)) for any function /; thus p{t) measures the “importance” of the X-value t 
with respect to the Y 1 value t. Many situations arise when it’s easy to generate random 
variables with an approximate distribution x{t)^ but difficult to generate them with 
the exact distribution y(t). In such cases we can estimate the average value E(f)= 
Ef(Y) by calculating E n {f) = (p(Xi)/(Xi) + • • •+ p(JC n )/(X n ))/n，where the Xj are 
independent random variables, each distributed as x{t). 

Let n = c 4 2 D ^ y ^ x \ Prove that if c > 1， this estimate E n is relatively accurate: 

\E(f)-E n (f)\ < ll/ll (1/C + 2VS：), where A c = Pv(p(Y) > c 2 2 D ^^). 
(Here ||/|| denotes (E/(Y) 2 ) 1 ’ 2 .) On the other hand if c < 1 the estimate is poor: 

Pr ( 五 n (l) > a) < c 2 + (1 — A c )/a. for 0 < a < 1, 

Here C V denotes the constant function f(y) =1 (hence E(l) = 1). 

Every man must judge for himself between conflicting vague probabilities. 

— CHARLES DARWIN, letter to N. A. von Mengden (5 June 1879) 


a prion 

information gained 
Importance sampling 
DARWIN 
von Mengden 


October 3, 2015 



ANSWERS TO EXERCISES 


It isn’t that they can’t see the solution. 
It is that they can’t see the problem. 

G. K. CHESTERTON ，The Scandal of Father Brown (1935) 


MATHEMATICAL PRELIMINARIES REDUX 

1. (a) A beats B in 5 + 0 + 5 + 5 + 0 + 5 cases out of 36; B beats C in 4+2 + 4+4+2+4; 
C beats A in 2 + 2 + 2 + 6 + 2 + 6. 

(b) The unique solution，without going to more than six spots per face，is 



(c) A = {F m - 2 x x 4}, B = {F m x3}, C = {F m _i x 2 5 F m _ 2 x 5} makes 

Pr((7>^4) = F m - 2 F m+ i/F^; and we have F m ^ 2 F m +i = F m -iF m - (-l) m . [Similarly, 
with n faces and A = {l_n/0 2 」x 1, [n/0] x 4} ， etc- the probabilities are 1/0 — 0(l/n). 
See R. P. Savage, Jr., AMM 101 (1994), 429-436.] 

2. Let Pr(A > B) = 乂， Pr(B > C) =8, Pr((7 > A) = C. We can assume that no x 
appears on more than one die; if it did, we could replace it by a: + e in A and x — e in C 
(for small enough e) without decreasing or C. So we can list the face elements in 

non decreasing order and replace each one by the name of its die; for example, the pre¬ 
vious answer (b) yields CBBBAAAAACCCCCBBBA. Clearly AB^ BC, and CA are 
never consecutive in an optimal arrangement of this kind: BA is always better than AB. 

Suppose the sequence is C Cl B bl A ai • • • C Ck B bk A ak where c* > 0 for 1 < i < /c 

and bi，ai > 0 for 1 < i < k. Let ol{ = a《/(ai + • • • + ak) ， /3i = h/( 6 i + • • • + b“, 

= a / (ci H - hc/c); then A = ai/3i +a 2 (/?i +/ 5 ‘ 2 ) H - , B = (3i 71 + 卢 2(71 + 72 ) H - ， 

C = 720 a + 73 ( 0^1 + a‘ 2 ) + • • • • We will show that < 1/0 when the a’s ， 

(3% and 7 ’s are nonnegative real numbers; then it is < 1/0 when they are rational 
The key idea is that we can assume h < 2 and ol^ = 0- Otherwise the following 
transformation leads to a shorter array without decreasing 8^ or C: 

7*2 = A 72 ， 7 ; = 7l+72 — 72 5 P 2 = 01 = ^ 1 +^ 2-/^2 5 a l = <^l/A, Oi2 — O^+a‘ 2 — 

Indeed, A / = C f = and B r — B = {1 — 入 ）（ /3i — 入 # 2 ) 72 , and we can choose A thus: 
Case 1: /3i > ^ 2 - Choose 入 = ai / (ai + 02 )，making ^2 = 0. 

Case 2: < p 2 and 71/72 < P 1 /P 2 - Choose A = 1 + 71 / 72 , making 7 ^ = 0 . 

Case 3: /3i < p 2 and 71/72 > ^ 1 /^ 2 - Choose A = 1 + Pi /making ^ = 0. 
Finally, then, 乂 = 外 ，/ 5 = 1 — ^ 72 , C = 72 ; they can’t all be greater than l/4>. 

[Similarly, with n dice, the asymptotic optimum probability p n satisfies p n = 

o4 n ) = 1 — Q^ n- 1 )o4 n ) =••• = ；[ — q^ 2 )q4 3 ) = a[ 2 \ One can show that f n (l — p n ) = 0, 
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where / n +i(x) = f n (x) —xf n -i(x)^ fo(x) = 1, fi(x) = 1 — x. Then f n (x 2 ) is expressible 
as the Chebyshev polynomial x n+1 U n -\-i(j^); and we have p n = 1 —1/(4 cos 2 7 r/(n + 2)). 
See Z. Usiskin ，Annals of Mathematical Statistics 35 (1964), 857-862; S. TVybula, 
Zastosowania Matematyki 8 (1965), 143-156.] 

3. Brute force (namely a program) finds eight solutions, of which the simplest is 



all with respective probabilities 蘇，劈， 暴 [If 回 is also allowed, the unique solution 



has the property that every roll has exactly one die below the average and two above ， 
with each of A, (7 equally likely to be below; hence all three probabilities are 2/3- 
See J- Moraleda and D. G. Stork, College Mathematics Journal 43 (2012) ， 152-159.] 

4. (a) The permutation (1 2 34)(5 6 ) takes A. So B versus C 

is like A versus etc. Also Pr(A beats C) = Pr(C beats A) = Pr(B beats D )= 
Pr(D beats B) = |||; Pr(A and C tie) = Pr(B and D tie)= 错 . 

(b) Assume by symmetry the players are C. Then the bingoers are (A, B ， (7, 

AB, AC, BC, ABC) with respective probabilities (168, 216,168,48,72,36,12)/720. 

(c) Ifs (A, AB, AC, ABC, ABCD) with probabilities (120,24,48,12,0)/720. 

5. (a) If Ak = 1001 with probability .99, otherwise = 0， but = 1000 always, 
then Piooo = .99 1000 ^ • 000043• (This example gives the smallest possible Piooo, 
because Pr((Ai - Bi) -f • • • + (A n - B n ) > 0) > Pr([Ai > Bi]... [A n > B n ]) = JFf.) 

(b) Let E = qo + q 2 + ^4 + ••- ~ 0.67915 be the probability that B = 0. Then 

Pr(A >B)= + E.to fe+i) « .47402; Pr (乂 < B) = ZT=o (1 -五 + 

Eto ^ 2 j) ^ -30807; and Pr(A = B) = Pr(A = B = 0) = E(1 - E) ^ .21790 is also the 
probability that AB > 0. 

(c) During the first nk rounds，the probability that either Alice or Bob has scored 
more than is at most nk{qk+i + O.k +2 + •••) = 0 ( 2 _/c ); and the probability that 
neither has ever scored rrik is (1 — qk) Uk < exp(—g^n/c) = exp(—2 k /D). Also > 
n/cm/c-i when k > 1. Thus Alice “quite surely” wins when k is even, but loses when k 
is odd，as A: —> oo. [The American Statistician 43 (1989) ， 277-278.] 

6 . The probability that Xj = 1 is clearly pi = l/(n — 1); hence Xj = 0 with probabil¬ 
ity po = (n —2)/(n —1). And the probability that X{ = Xj = 1 when i < j is pi. Thus 
(see exercise 20), (X“Xj) will equal (0,1) ，（ 1, 0), or (0, 0) with the correct probabilities 
popi, pipo, PoPo- But Xi = Xj = Xk = 1 with probability 0 when i < j < k. 

For 3-wise independence let Pr(Xi . •. X n = xi ... x n ) = a Xl +---+x n where 

ao = 2 ( n ~ 2 ), ai = ( n I 2 ), ci 3 = 1 , otherwise aj = 0 . 

7. Let f m (n) = SJLo (?)(—l) J ( n + 1 — rn) m ~\ and define probabilities via aj = 

//,_j(n—j) as in answer 6 . (In particular, we have /o(n) = 1 ， /i(n) = 0, /‘ 2 (n)= ( 打； 1 )， 
f 3 (n)= 2( n ~ 2 ), f 4 (n)= 3( n : 3 ) + ( n ; 3 ) 2 .) This definition is valid if we can prove that 
/ m (n) > 0 for n > because of the identity Q)/ m _j(n — j) = (n + 1 — m) m • 

To prove that inequality, Schulte-Geers notes (see CMath ( 5 . 19 )) that / m (n)= 
E :0 ( m r)b - m ) m ~ k = E :0 m(-l) fc (n - mr- k ' these terms pair up 
nicely to yield ZZ=o M'—W)( n _ 爪 ) m —even] + (^[m even]. 
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8 . If 0 < A: < n, the probability that k of the variables have any particular setting 
is 1/2' because the remaining variables have even parity as often as odd parity- So 
there’s (n — l)-wise independence, but not n-wise. 

9. Give probability 1/2 to 0 … 0 and 1 … 1; all other vectors have probability 0. 

10. If n > p we have X p +i = Xi, so there’s no independence. Otherwise, if m < n < 
there’s m-wise independence because any m vectors ( 1 ， j ，• • • ， j m 一 丄 ) are linearly inde¬ 
pendent modulo p (they’re columns of Vandermonde’s matrix, exercise 1.2.3—37); but 
the X^s are dependent (m + 1 ) - wise，because a polynomial of degree m cannot have 
m + 1 different roots. If m > n and n < p there is complete independence. 

Instead of working mod p，we could use any finite field in this construction. 

11. We can assume that n = 1， because (Xi H - h X n )/n and (X n +i H - h X 2 U )/n 

are independent random variables with the same discrete distribution. Then Pr(|Xi + 
X 2 - 2a\ < 2|Xi - a\) > Pr(|Xi - a| + |X 2 - a| < 2|Xi - a\) = Pr(|X 2 - a| < 
\Xi —a\) = (l + Pr(Xi = X 2))/2 > 1/2. [This exercise was suggested by T. M. Cover.] 

12. Let w = Pr(Aand5), x = Pr(AandB), y = Pr(AandB) ， 2 ： = Pr(Aand5). All 
five statements are equivalent to wz > xy^ or to | = = | > 0, or to U A and B are strictly 
positively correlated” (see exercise 61). [This exercise was suggested by E* Georgiadis-] 

13. False in many cases* For example, take Pr(A and 5 and C) = Pr(A and B and C )= 
0, Pr(Aand J5and(7) = 2/7, and all other probabilities 1/7. 

14. Induction on n. [Philosophical Transactions 53 (1763) ， 370—418, proof of Prop. 6 -] 

15. If Pr((7) > 0, this is the chain rule, conditional on C. But if Pr(C) = 0, it’s false 
by our conventions, unless A and B are independent- 

16. If and only if Pr(A n B n (7) = 0 # Pr(B) or Pr(A n C) = 0. 

17. 4/51， because four of the cards other than Qa are aces. 

18. Since (M - X)(X - m) > 0, we have (MEX) - (EX 2 ) - 1 - (mEX) - mM > 0. 
[See C- Davis and R. Bhatia ，AMM 107 (2000), 353— 356， for generalizations-] 

19. (a) The binary values of Pr(X n = 1) = E(X n ) for n = 0, 1 ， 2， •• • ， are respec¬ 
tively (. 0101010101010101 ... )* 2 , (.0011001100110011 ...) 2 , (.0000111100001111 ...)* 2 , 
…； thus they’re the complemented reflections of the “magic masks” 7.1.3—( 47 ). The 
answer is therefore ( 2 2n — l)/( 2 2n+1 — 1 ) = l/( 2 2n + 1 ). 

(b) Pr(XoXi ... X n -i = xoxi... x n ~i) = 2( 无 n-i …无 1 免 o) 2 /( 2 2 n —1) can be “read off” 
from the magic masks by ANDing and complementing. [See E. Lukacs, Characteristic 
functions (I960) ， 119, for related theory*] 

(c) The infinite sum S is well defined because Pr(S = oo) = 0- Its expectation 

ES = E: 0 l/(2 2n + 1) a 0-59606 corresponds to the case z = 1/2 in answer 7.1.3- 
41(c). By independence, var*S = var -^n — 2 2 n /(2 2n + l ) 2 c 0.44148. 

(d) The parity number ER= (.0110100110010110 • • • )2 has the decimal value 

0.41245 40336 40107 59778 33613 68258 45528 30895—, 

and can be shown to equal | — \P where P = n==o(l — 1/2 2 &) [R. W. Gosper and 
R- Schroeppel, MIT AI Laboratory Memo 239 (29 February 1972)，Hack 122]，which is 
transcendental [K. Mahler, Mathematische Annalen 101 (1929) ， 342-366; 103 (1930 )， 
532]. (Furthermore it turns out that 1/P — 1/2 = YlU ~ Since R is 

binary ， var(i?) = (ER)(1-ER) ^ 0.242336. "" "" 
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(e) Zero (because 7 r is irrational 5 hence po +pi H - = oo). However, if we ask the 

analogous question for Euler’s constant 7 instead of 丌 , nobody knows the answer. 

(f) EY n = 2 EX n ; in fact, ~Py{YoYiY 2 • • • = X 0 X 1 X 2 …）, for any infinite string 
X 0 X 1 X 2 ••” is equal to 2 Pr(XoXiX ‘2 •… = xox\X 2 • • •) modi, because we shift the 
binary representation one place to the left (and drop any carry). Thus in particular, 
E Y m Y n = 2 EX m X n = ^ E Y m EY n when m ♦ n\ Y m and Y n are negatively correlated 
because covar(y m ^Y n ) = —| E Y m E Y n . 

(g) Clearly ET = 2ES. Also ET 2 = 2E5 2 , because EY m Y n = 2EX n for all 
m and n. So var(T) = 2(var(5) + (E5) 2 ) - (2E5 ) 2 = 2 var ⑹- 2(E5 ) 2 « 0.17237. 

20. Let pj = EXj. We must prove, for example, that E(Xi(l — X 2 ) (1 — X3)Xa)= 
pi(l — p‘ 2 )(l — j? 3 )P 4 when k > 4. But this is E(XiX 4 — X 1 X 2 X 4 — X 1 X 3 X 4 + 
X 1 X 2 X 3 X 4 ) = PlP4 - P 1 P 2 PA - P1P3P4 + P 1 P 2 P 3 P 4 - 

21. From the previous exercise we know that they can’t both be binary. Let X be 
binary and Y ternary, taking the values (0,0), (0,1), (0,2), (1,0) ， (1,1) ， (1 ， 2) with 
probabilities respectively proportional to (a ， 6 , 3a + 6 + 3d ， d ， 1 ， 1). Then EXY = 3/D ， 
EX = 2/D^ and EY = 3/2, where D = Aa + 2b + Ad + 2. 

22. By (8) we have Pr(Ai U ••• U A n ) = E [A\ U ••• U A n ] = Emax([Ai]， ••” [A n ]) < 
E([Ai] + … + [A n ]) = E[Ai] + … + E[A n ] = Pr(Ai) + … + Pr(A n ). 

23. The hinted probability is Pr(X 5 = 0 and X\ H -+ X s -i = s — r), so it equals 

(1 — p) r • To get sum it for r = n — m and n — m < s < n. [For an 

algebraic rather than probabilistic / combinatorial proof, see CMath, exercise 8.17.] 


24. (a) The derivative of B m , n (x) = ^2^=0 — x ) n ~~ k IS 


n — k 


» -x) n ~ k - (n- h)x k (l - x) 

/c=0 \ 


n — 1 一 k 


m 


m — 


n 


(E(V 

k=0 


n — 1 


—n 


m 


)/(i- 




n— 1 — k 




X) 


n— 1 — k 


n—l — m 


[See Karl Pearson, Biometrika 16 (1924) ， 202—203 」 

(b) The hint, which says that J^/( a+6+1 ) x ft (l — x) b dx < J^ a+b+1 ^ x a (l — x) h dx 
when 0 < a < 6, will prove that 1 — J5 m ， n (m/n) < B m , n (m/n). It suffices to show that 
J^ /(a+b) x a (l-x) b dx < f^ /(a+b) x a (l-x) b dx, because we have/ 0 a/(a+6+1) < / 0 a/(a+6) < 

fa/(a+b) K fa/(a+b+i)- Let ^ = (a-e)/(a + 6), and observe that (a - e) a (b-\-e) b is less 
than or equal to (a + e) a (b — e) b for 0 < e < a, because the quantity 


a 


e 


a 


e 


) 


a 


e 


a(ln(l — e/a) — ln(l+e/a)) — 


-2e 1 


e 


2 


e 


4 


• • • 


3a 2 5a 4 


)) 


increases when a increases. 

(c) Let tk = (^)m k (n — m) n ~ k . When m > n/2 we can show that 1 — B m , n (jn/n) = 

5^/c>m 免 kI 〈 B m ^ n ( K Tnf — h—Q ^/ 几几 ， because for 1 ^ d ^ n — m 

For if r d = t m+d /t m +i^d^ we have n = m/(m + 1) < 1; also 


2 


Td+i _ (n — m + d) (n — m — d)m 
Td (m + 1 + d)(m + 1 — d)(n — m) 2 


<1， 
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because ((m + 1) 2 —c? 2 )(n —m ) 2 — {{n — m) 2 — d 2 )m 2 = ( 2 m + 1 )(n — m) 2 + ( 2 m — n) nd 2 . 

[Peter Neumann proved in Wissenschaftliche Zeitschrift der Technischen Univer- 
sitat Dresden 15 (1966), 223—226，that m is the median. The argument in part (c) is 
due to Nick Lord, in The Mathematical Gazette 94 (2010) ， 331-332*] 


25. (a) ( ⑵ )— H)) is ^2piqj(qt/(n — k) — pt/{k + 1))，summed over all partitions 
of {1 ， •…， n} into disjoint sets / U J U where \I\ = \J\ = n — k — 1, pi = Yliei 

qj = Qj- And qt/(n - k) -pt/(k + 1) > 0 Pt < (k + l)/(n + 1). 

(b) Given pi, ••” p n -i^ the quantity ((^)) is maximized when p n = p, by (a). The 


same argument applies symmetrically to all indices j. 

鲁# ♦鲁 2 ♦ ♦ 

26. The inequality is equivalent to r n , k > r n ,/c—ir n ,/c+i ， which was stated without 
proof on pages 242-245 of Newton’s Arithmetica Universalis (1707), then finally proved 
by Sylvester many years later [Proc. London Math. Soc. 1 (1865) ， 1-16]- We have 
nr u ,k = — i ， /c—i+ (n —A:)g n r n 一 i ， /c; hence n(r 2 n ^ k - r n ,k-ir n ,k+i) = {p n r n -i,k-i - 

q n r n -i,k) 2 + (k 2 - l)p 2 n A (k - l)(n - 1 - k)p n q n B + ((n - k) 2 - l)q 2 n C, where A = 

i — 1,/c—2^n—1,/c ^ ^ = ”n—1,/c—l”n—1,/c — - l ， /c—2”n—l ， /c+l ， S-Ild C = T n _ — 

r n 一 i,/c—ir n 一 i,/c+i are nonnegative, by induction on n. 


27. ((fc)) = (^-^ 1 + -p n _ m+A; ), by the same argument as before. 

28. (a) dl)) = O + + ((111))0 and Eg(X) = E^o O, where 

A= (l- Pn-i)(l -Pn), c = Pn-iPn, B = 1 - A - C, and h k = Ag(k) + Bg(k + 1) + 
Cg(k + 2). If the pj’s aren’t all equal, we may assume that p n -i < p < p n - Setting 
Pn-i = Pn-i + e and p n = p n - where e = min(p n - p^p - J>n-i )， changes 

C to A! = A + 8^ B r = B — 25, C’ = (7 + J，where S = (p n — p){p — p n -i)] hence 
hk changes to h r k = hk + S(g(k) — 2g(k + 1) + g(k + 2)). Convex functions satisfy 
g{k) — 2g(k + 1) + g{k + 2) > 0, by ( 19 ) with x = k and y = A: + 2; hence we can 
permute the p’s and repeat this transformation until pj = p for 1 < j < n. 

(b) Suppose E g(X) is maximum, and that r of the p’s are 0 and s of them are 1- 
Let a satisfy (n — r — s)a + 5 = np and assume that 0 < p n -i < a < p n < 1. As in 
part (a) we can write E g(X) = aA + /3B + for some coefficients a ， /? ， 7 . 

If a —2 卢 +7 > 0, the transformation in (a) (but with a in place of p) would increase 
E g(X). And if a — 2/3 + 7 < 0, we could increase it with a similar transformation, 
using S = — min(p n —i，l — p n ). Therefore a — 2/3 + 7 = 0; and we can repeat the 
transformation of (a) until every pj is 0 , 1 , or a. 

(c) Since (©)=0 when s > m，we may assume that s < hence r + 5 < n. 

For this function g(k) = [0 < A: < m] we have a — 2/3 + 7 = (( n ~ 2 )) — (( 二二 2 J). This 
difference cannot be positive if the choice of {pi, ••” p n } is optimum; in particular we 


cannot have s = m. If r > 0 we can make p n -i = 0 and p n = a, so that (( n_2 ))= 

- a) n - r - 1 ~ m and ((^)) = ( 二 二二丄 ^ 爪 ― 卜 "(l _ a) n —But 

then the ratio (( n ~ 2 )) / ((::!)) = (n — r — m)a/((m — s)(1 — a)) exceeds 1; hence r = 0. 

Similarly if 5 > 0 we can set (Pn - i ， Pn) = (a ， 1)，getting the ratio (( n ~ 2 )) / (( 二二 2 J)= 

(n — 1 — m)a/((m — s 1)(1 — a)) > 1. In this case (( n ~ 2 )) = (Q:;)) if and only if 
np = m + 1; we can transform without changing E g(X)^ until 5 = 0 and each pj = p. 
[Reference: Annals of Mathematical Statistics 27 (1956), 713-721. The coefficients 


((:)) also have many other important properties; see exercise 7-2.1.5-63, and the survey 
by J- Pitman in J. Combinatorial Theory A77 (1997) ， 279-303-] 


29. The result is obvious when m = 0 or n; and there’s a direct proof when m = n — 1: 
B n -i^ n (p) = 1 — p n > (1 - p)n/((l - p)n -f p) because p - np n + (n _ l)p n+1 = 
p(l — p)(l + p + +p n_1 — p n ~~ l n) > 0. The result is also clear when p = 0 or 1. 
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If p = (m + l)/n we have R m ,n(p) = ((1 - p)(m + 1)/((1 - p)m + l).) n_m = 
((n —m — l)/(n —So if m > 0 and p = m/(n — 1)，we can apply exercise 28(c) 
with pi = … =pn-i = p and p n = 1 : 

BmAp) > Elo O = Er =0 {T-\)p k ~ l i)--p) n - k = 

When 1 < m < n - 1, let Q m ,n(p) = - R m ， n(p). The derivative 

Q'mAp) =( 几 - ^)0(1- pT^-^A- F(p ))/((1 -p)m + 1 ) — +1 , 

where A = (m l) n_m /( 二 ) > 1 and F(p) = p m ((l — p)m H- l) n_m+1 , begins positive 
at p = 0, eventually becomes negative but then is positive again at p = 1 - (Notice that 
F(0) = 0, and F(p) increases dramatically until p = (m + l)/(n + 1); then it decreases 
to F(l) = 1.) The facts that > 0 = Q m n (0) = Q m ， n(l) now complete the 

proof，because Q’ m ， n (p) changes sign only once in [0 … [Annals of Mathematical 
Statistics 36 (1965), 1272-1278.] 

30. (a) Pr(X/, = 0) = n/(n + 1); hence p = n n /(n + l) n > 1/e « 0.368. 

(b) (Solution by J. H. Elton.) Let pkm = Pr(X/ c = m). Assume that these 

probabilities are fixed for 1 < A: < n, and let x m = Pnm- Then xo = x ‘2 + 2 x 3 + 3 x 4 + . • •; 
we want to minimize p = (^m + (m — l)Ao)x m in nonnegative variables xi^ X 2 ^ 

•…， where A m = Pr(Xi+- - -+X n _i < n—m), subject to the condition E : =1 mx m = 1- 
Since all coefficients of p are nonnegative，the minimum is achieved when all x m for 
m > 1 are zero except for one value m = m n , which minimizes (A m + (m — l)Ao)/m. 
And m n < n + 1, because A m = 0 whenever m > n. Similarly mi, • • • ， m n —i also exist- 

(c) (Solution by E. Schulte-Geers.) Letting mi = • • • = m n = t < n + 1^ we want 
to minimize B^ n / t j^ n (l/t). The inequality of Samuels in exercise 29 implies that 

1 , m + l " 、 (m+ 1)(1 -p)n 

> 1 — 77 - x—T for P< -， where f{m,n,p )= —— - r - ， 

\ + n [n — m)p 

because we can set x = ((1 — p)m + 1)/((1 — p)(m + 1 )) in the arithmetic-geometric 
mean inequality x n ~ m < ((n — m)x + m) n /n u . Now 1/t < (Lw / 尤 」 + l)/(n + 1) and 
f([n/t\,n,l/t) > n; hence B ln/tUn (l/t) > n n /(n + l) n . 

[Peter Winkler called this the “gumball machine problem” in CACM 52, 8 (August 
2009), 104-105. J. H. Elton has verified that the joint distributions in (a) are optimum 
when n < 20; see arXiv ： 0908.3528 [math.PR] (2009)，7 pages. Do those distributions 
in fact minimize p for all n? Uriel Feige has conjectured more generally that we have 
Pr(Xi + …+ X n < n + l/(e —1)) > 1/e whenever X\^ … ， X n are independent 
nonnegative random variables with EXk < 1; see SICOMP 35 (2006) ， 964-984.] 

31. This result is immediate because Pr(/([Ai]，• • • ， [A n ])) = E /([Ai], • • • ， [A n ]). But 
a more detailed, lower-level proof will be helpful with respect to exercise 32. 

Suppose, for example, that n = 4. The reliability polynomial is the sum of the 
reliability polynomials for the minterms of /; so it suffices to show that the result is 
true for functions like a;i 八元 2 八无 3 八 $4 = x\{l — X 2 ) (1 — And it’s clear that 

Pr(AiH^20^30 A 4 ) = Pr(Ai Pi A‘ 2 n A 4 ) — Pr(Ai n 0^30^4) = 7Ti4 _ 7T124 — 7T134+7T1234. 

(See exercise 7.1.1—12; also recall the inclusion-exclusion principle.) 

32. The 2 n minterm probabilities in the previous answer must all be nonnegative，and 
they must sum to 1- WeVe already stipulated that 丌 0 = 1， so the sum-to-1 condition is 
automatically satisfied. (The condition stated in the exercise when / C J is necessary 
but not sufficient; for example, 7 Ti 2 must be > 7 Ti + 7 T 2 — 1 -) 
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33. The three events (X^Y) = (1 ， 0) ，（ 0 ， 1) ， （ 1 ， 1) occur with probabilities p ， g ， r ， 

respectively- The value of E(X | Y) is 1, r/(g + r) ， r/(q + r) in those cases. Hence the 
answer is pz + + r)z r /^ +v \ (This example demonstrates why univariate generating 

functions are not used in the study of conditional random variables such as E(X | Y). 
But we do have the simple formula E(X | Y=k) = ([z k ] z)) / ([z k ] G(l, z)).) 

34. The right-hand side is 

E E ( X I Y ) Pr (^) = E Pr ⑼ E X (J) Pr(J) ^ M]/ Pr(y = YM) 

UJ UJ ( J j , 

=^Pr(o;) ^X(a/) Pr(o/)|T(a/) =Y(o;)]/Pr(：K = Y(a/)) 

= ^2X(^) Pr(o/)ZPr(o;)|T(a;)=y(a/)]/Pr(y = yV)). 

UJ’ OJ 

35. Part (b) is false- If, for instance^ X and Y are independent random bits and 
Z = X, we have E(X \Y) = ^ and E(| \Z) = ^ ^ X = E(X | Z). The correct formula 
instead of (b) is 

E(E(X\Y,Z)\Z) = E(X\Z). (*) 

This is ( 12 ) in the probability spaces conditioned by and it is the crucial identity 
that underlies exercise 91 • Part (a) is true because it is the case Y = Z of (*)• 

36. (a) /(X); (b) E(f(Y)g(X)), generalizing ( 12 ). Proof: E(f(Y)E(g(X) \ Y))= 

Y ： yf(y)^9W\y=y) My=y) = E x ,y f(y)g{x)Y>r{x=x,Y= y ) = E(f(Y) g (x)). 

37. If we^re given the values of Xi^ • • •, Xk-i, the value of Xk is equally likely to be 

any of the n + 1 — k values in {1，• • • ， n} \ {X\ ， … • ， X^-i}. Hence its average value is 
(1 + • • • + ti — X\ — • • • — X ]^—\) j{n -f* 1 — k)• We conclude that E(_X^ | ” = 

(n(n + 1)/2 — Xi —…— Xk—i)j{n + 1 — A:). [Incidentally, the sequence Zo^ Zi ， …， 
defined by Zj = (n + j)X\ + (n + j — 2 )X 2 + ... + (n — j)Xj+i — (j + l)n(n + 1)/2 for 
0 < j < n and Zj = Z n -i for j > n, is therefore a martingale-] 

38. Let t m ， n be the number of restricted growth strings of length m + n that begin 
with 01 • • • (m—1). (This is the number of set partitions of {1，•••，m + n} in which each 
of {1, •… , m} appears in a different block.) The generating function ^ n > 0 tm,nZ n /n\ 
turns out to be exp(e^ — 1 + mz)\ hence t m , n = vjk ( 2 )m n_/c . 

Suppose M = max(JG ， ••” X/e—0 + L Then Pr(X fc = j) = t M ,n-k/tM,n+i-k 
for Q < j < M, and — k for j = M. Hence E(X k \ X 0 ,. - . ,X k -i )= 

+ MtM+l,n-k) /tM,n+l-k^ 

39. (a) Since E(K \N = n) = pn we have E(X| N) = pN. 

(b) Hence EK = E(E(K\N)) = EpN = pfi. 

(c) Let Pnk = Pr(N = n，K =k) = (e~ 11 ii n /n\) x (:)/(l -p) n ~ k = (e~ x 

f(n-k), where f(n) = (1 -p) n /i n /n\. Then E(N\K = k) = J2 n np nk /^2 n p n k- Since 
n/(n — k) = kf{n — k) + {n — k) f{n — k) and n/(n) = (1 — p)^f{n — 1)，the answer 
is A: + (1 — p)fi] hence E(N \ K) = K + {1 — p)ii. [G- Grimmett and D. Stirzaker, 
Probability and Random Processes (Oxford: 1982), §3.7.] 

40. If p = Pr(X > m)，clearly EX < (1 — p)m + pM. [We also get this result from 
( 15 )， by taking S = {x \ x < m }， f(x) = M — s = M — m.] 

41. (a) Convex when a > 1 or a = 0; otherwise neither convex nor concave. (However, 
x a is concave when 0 < a < 1 and convex when a < 0 ， if we consider only positive 
values of x.) (b) Convex when n is even or n = 1; otherwise neither convex nor concave. 
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(This function is t n ~ 1 e x ~ t dt/(n—l)l^ according to 1.2.11.3—(5); so f\x)/x > 0 when 
n > 3 is odd.) (c) Convex. (In fact f(\x\) is convex whenever f(z) has a power series 
with nonnegative coefficients, convergent for all z.) (d) Convex，provided of course that 
we allow f to be infinite in the definition (19). 

42. We can show by induction on n that /(pi^i + - - -+p n x n ) < pi/(xi)+- - -+Pn/(^n), 

when pi, • • •, pn > 0 and pi + ••• + = 1, as in exercise 6.2.2—36. The general result 

follows by taking limits as n —>• cxo- [The quantity pixi + ••- +PnX n is called a “convex 
combination” of {xi, … ， x n }; similarly, EX is a convex combination of X values- 
Jensen actually began his study by assuming only the case p = g = | of (19).] 

43. /(EX) = /(E(E(X I Y))) < E(/(E(X | Y))) < E(Ef(X)\Y) = Ef(X). [S. M. 
Ross, Probability Models for Computer Science (2002), Lemma 3.2.1.] 

44. The function f(xy) is convex in y for any fixed x. Therefore g(y) = Ef(Xy) is 
convex in y: It’s a convex combination of convex functions. Also g(y) > f{EXy)= 
/(0) = g(0) by (20). Hence 0 < a < 6 implies ^(0) < g(a) < g(b) by convexity of g. 
[S- Boyd and L. Vandenberghe, Convex Optimization (2004)，exercise 3. 10 」 

45. Pr(X > 0) = Pr(|X| > 1); set m = 1 in (16). 

46. EX 2 > (EX) 2 in any probability distribution, by Jensen’s inequality，because 
squaring is convex. We can also prove it directly, since EX 2 — (EX) 2 = E(X — EX) 2 . 

47. We always have Y > X and Y 2 < X 2 . (Consequently (22) yields Pr(X > 0)= 
Pr(y > 0) > (EY) 2 /(EY 2 ) > (EX) 2 /(EX 2 ) when EX > 0.) 

48. Pr(a —Xi - • —> 0) > a 2 /(a 2 +o^ + . • .+o^)，by exercise 47 - [This inequality 

was also known to Chebyshev; see J. Math. Pures et Appl. (2) 19 (1874) ， 157-160. In 
the special case n = 1 it is equivalent to “Cantelli’s inequality^ 

Pr(X > EX + a) < var(X)/(var(X) + a 2 ), for a > 0; 

see Atti del Congress。Internazionale dei Matematici 6 (Bologna: 1928) ， 47-59 5 §6 - §7.] 

49. Pr(X = 0) = 1 —Pr(X > 0) < (EX 2 -(EX) 2 )/EX 2 < (EX 2 -(EX) 2 )/(EX) 2 = 
(EX 2 )/(EX) 2 — 1. [Some authors call this inequality the “second moment principle,” 
but it is strictly weaker than (22).] 

50. (a) Let Yj = Xj/X if Xj > 0, otherwise 1^- = 0. Then Y\ + ... + = [X > 0]. 

Hence Pr(X > 0) = and EYj = E(Xj/X \ Xj >0) . Pr^- >0). [This iden- 

tity, which requires only that Xj > 0 , is elementary yet nonlinear, so it apparently lay 
undiscovered for many years. See D. Aldous, Discrete Math. 76 (1989) ， 168.] 

(b) Since Xj e {0,1}, we have Pr(Xy > 0) = EXj = Pj ; and |Xy >0)= 

EiXj/XlXj^) = E(l/X|X j = l) > l/E(X\Xj = l). 

(c) Pr(Xj = 1) = Pr(J = j and Xj = 1) = Pj/ m — E-AT/m. Hence 

Pr(J = = = Pr(J = j and = 1)/Pr(Xj = 1) = (p~/m)/(EX/m) = Pj /EX. 

(d) Since J is independent we have tj = E(X \J = j and Xj = 1) = E(X \ Xj = 1). 

(e) The right side is (E/ Ex )/tj > (E^)/E7 = i(Pi/EX)^, 

51. If g(qi) … ,qm) = 1 — f (pi) … ,p m ) is the dual of /， where qj = 1 — Pj，a lower 
bound on g gives an upper bound on f. For example, when / is ^1X2^3 Vx‘2 尤 3 尤 4 VX4X5, 
f is X 1 X 4 + X 2 X 4 + xsxa + X 2 X 5 + ^3^5 - So the inequality (24) gives g(q^ … > 

QlQA：/ (1+^2 +^3 + Q2 ^5 +^ 3 ^ 5 ) +^2^4 / (^1 + 1 + ^3 + ^5 +Q3Q5) +^3^4 / (^1 +^2 + 1 +^2^5 + 

仍 ) + q2q5/(qiq4 + 仍 + ^3^4 + 1 + 奶 ) + qzqbj (gi^4 + q ： iq\ + 仍 + 奶 + 1). In particular, 

p(.l，•••，•!_)> 0.039 and /(.9, •.., -9) < 0.961. 

於 . ( n k )P k /Y ： U0O 3 . 
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53. f(pi, . . . ^po) > P1P2 (1 ~P3) / (1 +P4P5 (1 -P6)) H - hP6Pl(l—P2)/(1+P3P4(1—P5)). 

Monotonicity is not required when applying this method, nor need the implicants be 
prime. The result is exact when the implicants are disjoint. 

54. (a) Pr(X >0) < EX = O 3 , because EX UVW = p 3 for sl\ u < v < w. 

(b) Pr(X > 0) > (EX) 2 / (EX 2 ), where the numerator is the square of (a) and the 
denominator can be shown to be Q )^ 3 + 12(^)p D + 30(g);p 6 + 20(g)p 6 . For example, 
the expansion of X 2 contains 12 terms of the form X UVW X UVW / with u < v < w < w f ^ 
and each of those terms has expected value p 5 . 

55• A BDD for the corresponding Boolean function of ( 2 °) = 45 variables has about 
1.4 million nodes，and allows us to evaluate the true probability (1 — p) 4o G(p/ (1 — p)) 
exactly, where G(z) is the corresponding generating function (see exercise 7-1.4-25). 
The results are: (a) 30/37 « .811 < 35165158461687/2 45 « .999 < 15; (b) 10/109 « 
.092 < 4180246784470862526910349589019919032987399/(4 x 10 43 ) ^.105 < .12. 

56. The upper bound is // = 入 3 / 6 ; the lower bound divides this by 1 + [The exact 
asymptotic value can be obtained using the principle of inclusion and exclusion and its 
“bracketing” property, as in Eq. 7.2.1. 4 -( 48 ); the result is 1 _ e 一 See P. Erdos and 
A. Renyi ， Magyar Tudomanyos Akademia Mat. Kut. Int. Kozl. 5 (1960) ， 17—61, §3.] 

57. To compute E(X | X uvw = 1) we sum | X uvw = 1) over all ( 3 ) choices 

of u < v < w f . If {?/ ，？ / ， w/}fl {u^ w} has t elements，this probability is p 3 一 尤 “ 一 ”/ 2 ; 

and there are Q) ( 3 "^) such cases- Consequently we get 

Pr(X >0)> 0 3 /(( n sV + 3(W + 3(W + ( n oV)* 


[In this problem the lower bound turns out to be the same using either inequality; but 
the derivation here was easier.] 

58. Pr(X > 0) < The lower bound，using the conditional expectation 

inequality as in the previous answer, divides this by (t) d)p fc ( fc_1 )/ 2_ ’(’ _1 )/ 2 . 

59. (a) The hypotheses imply that aoaibobi < cocidodi. The key observation is that 


cido( (co+ci) (do+di) — (ao+ai) ( 60 + 61 ))= 

cido(codo-aobo^-cidi —ai 6 i) + (cido —ao 6 i) (cido —ai 6 o)+cocicMi —aociibobi. 


Thus the result holds when ado 7 ^ 0. If ci = 0 we have aobo + aobi + aibo + aibi = 
aobo < codo < co (do + di). And a similar argument applies to the case do = 0. 

All four hypotheses hold with equality when ao = bo = do = 0 and the other 
variables are 1， yet the conclusion is that 1 < 2. Conversely, when 61 = ci = 2 and the 
other variables are 1 ， we have aibo < cido but conclude only that 6 < 6 . 

(b) Let Ai = ^{a* 2 j+/ | 0 < j < 2 n_1 } for / = 0 and l = 1, and define C[^ Di 
similarly from 6 ‘ 2 j+z, c‘ 2 j+z，The hypotheses for j mod 2 = l and k mod 2 = m 
prove that AiB m < (7z| m Dz& m ，by induction on n. Hence, by part (a)，we have the 
desired inequality (Ao + A\){Bo + Bi) < (Co + Ci)(Do + Di). [This result is due 
to R. Ahlswede and D* E. Day kin, Zeitschrift fiir Wahrsch einlichkeitsth eorie und ver- 
wandte Gebiete 43 (1978), 183—185, who stated it in the language of the next exercise.] 

(c) Now let A n = ao + ••• + a‘ 2 打一 i, and define B n , C n , D n similarly. If AooBoo > 
CooDoo^ we’ll have A n B n > CooDoo for some n. But CooDoo > C n D n , contra (b). 

60. (a) We can consider each set to be a subset of the nonnegative integers. Let a{S) = 

a(S)[SeT], P(S) = /3(S)[Seg], -(S) = -f(S)[SeTugi s(s) = s(S)[SeTr\g }； 

then a(p) = ap 7 ) ， /5(p) = /3(G), 7 (p) = 7 ( 7 Up)，and S(p) = where p is the 
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family of all possible subsets- Since any set S of nonnegative integers can be encoded 
in the usual way as the binary number s = 2 J ，the desired result follows from the 

four functions theorem if we let a s = b s = /3(S) ， c s = 7 ( 6 "), d s = S(S). 

(b) Let a(S) = /3(S) = j(S) = S(S) = 1 for all sets S. 

61. (a) In the hinted case we can let a(S) = f(S)fi(S), /3(S) = g(S)/j,(S), j(S)= 

S(S) = the four functions theorem yields the result- The general 

case follows because we have E(/p) — E(/) E(g) = E(fg) — E(/) E ⑼， where f(S)= 
f(S) — /(0) and g(S) = g(S) — p(0). [See Commun. Math. Physics 22 (1971), 89-103.] 

(b) Changing f(S) to 0f(S) and g(S) to 4>g(S) changes E(fg) — E(/) E(g) to 
0(j)(E(fg) — E(/) E(p))，for all real numbers 6 and 4>. 

(c) If S and T are supported, then R = S D T and U = S U T are supported- 
Furthermore we can write S = RU {si ， •… ， Sk} and T = R\J {ti^... ^ti} where the 
sets Si = RU {51 ,•••，&} and Tj = RU {ti^... ^tj} are supported, as are their unions 
Uij = Si U Tj^ for 0 < i < k and 0 < j < /. By (iii) we know that /^(Ui+ij)//i(Uij) < 
#(t/i+i ， j+i)/"(/7i ， j+i) when 0 < i < k and 0 < j < 1. Multiplying these inequalities 
for 0 < i < A:, we obtain l^(Ukj)/< /i(C/fc,j+i)//i(t/o,j+i). Hence /i(S)//i(R)= 

^ 0 )/^ 0 ) < Ku k ， i)hAUo，d = imlKn 

(d) In fact，equality holds, because [j G 5] + [j G T] = [j 6 5 U T] + [j 6 S fl I 1 ]. 
[Note: Random variables with this distribution are often confusingly called “Poisson tri¬ 
als^ a term that conflicts with the (quite different) Poisson distribution of exercise 39.] 

(e) Choose c in the following examples so that fi(S) = 1. In each case the 
supported sets are subsets of U = {1 ， … ， m}. (i) Let fi(S) = crir‘ 2 … r|s|，where 
0 < n < ••- < r m . (ii) Let fi(S) = cpj when S = {1，• • • ， j} and 1 < j < m ，otherwise 
fi(S) = 0. (If pi = ••- = p m in this case, the FKG inequality reduces to Chebyshev’s 
monotonic inequality of exercise 1.2.3-31.) (iii) Let 

/^(s) = cfn(s n Ui)/i2(s n U 2 ).. - n c/^), 

where each [ij is a distribution on the subsets of Uj C U that satisfies (**)• The 
subuniverses Ui, •••, Uk needn’t be disjoint- (iv) Let fi(S) = ce 一 ’( s )， where / is 
a submodular set function on the supported subsets of U: f(S U T) + f(S fl T) < 
f(S) + f(T) whenever f(S) and /(T) are defined. (See Section 7.6.) 

62. A Boolean function is essentially a set function whose values are 0 or 1. In 

general, under the Bernoulli distribution or any other distribution that satisfies the 
condition of exercise 61, the FKG inequality implies that any monotone increasing 
Boolean function is positively correlated with any other monotone increasing Boolean 
function，but negatively correlated with any monotone decreasing Boolean function. 
In this case, / is monotone increasing but g is monotone decreasing: Adding an edge 
doesn’t disconnect a graph; deleting an edge doesn’t invalidate a 4-coloring. 

(Notice that when / is a Boolean function ， E/ is the probability that / is true 
under the given distribution. The fact that covar (/ 5 g) < 0 in such a case is equivalent 
to saying that the conditional probability Pr (/ | ^) is < Pr(/).) 

63. If u is the event (Zo = a，Zi = 6 )，we have Zo(oj) = a and E(Zi | Zo)(cu)= 

(pal + 2p a2 ) / (pao + Pal + Pa 2 ). Hence poi = P 02 = P 20 = i> 2 i = 0, and pio = P 12 ； these 

conditions are necessary and sufficient for E(Zi | Zo) = Zo. 

64. (a) No. Consider the probability space consisting of just three events (Zo, Z 2 )= 
(0,0, —2) ， (1 ， 0,2) ， (1 ， 2,2)，each with probability 1/3. Call those events a ， 6 , c. Then 
E(Zi I Z 0 ) ⑷ = 0 = Z 0 ⑷ ; E(Zi I Z 0 )(b,c) = |(0 + 2) = Z 0 ( 6 ,c); E(Z 2 | Zi)(a, 6 )= 
|(-2+ 2) = Zi(a, 6 ); E(Z 2 |Zi)(c) = 2 = Z 1 (c). But E(Z 2 | Z 0 , Zi)(a) =-2 ^ Z^a). 


sets, represented as integers 
Poisson trials 
Poisson distribution 
Chebyshev’s monotonic inequality 
submodular set function 

FKG 

Boolean function 
monotone 

conditional probability 
covariance 
conditional prob 


October 3, 2015 



36 ANSWERS TO EXERCISES MPR 

(b) Yes- We have Y. z ^ (^n+i - z n ) Pr(Z 0 = 勿， … ， Z n+ i = z n +i) = 0 for all 

Tl I 丄 

fixed ( 2 : 0 , • • • ， z n ). Sum these to get ^ {z n +i - z n ) Pr(Z n = z n , Z n+ i = z n +i) = 0. 

Tl I 丄 

65. Observe first that E(Z n +i | Zo, ••• ，々 ）= E(E(Z n +i | Z 。， … ， Z n ) \ Zo^ … ， Z^)= 

E(Z n I Zo, … ， Zk) whenever k < n. Thus E(Z m ( n+1 ) | ••” Z m (n)) = Zm(n ) 加 all 

n > 0 - Hence E(Z m ( n+1 ) | Z m (o) ， … ， ^m(n)) = ^m(n)^ as m the previous exercise. 

66. We need to specify the joint distribution of {Zo, • • • ， Z n }, and it’s not difficult to see 
that there is only one solution. Let p{cn^ • • • ， a n ) = Pr(Zi = ai， ••” Z n = a n n) when 
(Ji ， …， <7 n are each 士 !_• The martingale law p(<Ji .. .a n l) (n+1) —p(ai .. .a n l) (n+1)= 
(Tnp{cri - - - (Tn)n = (T n {p{(Tl - - - CT n l) +p((Ji • • • (T u l))n gives p{(Ti . . . (Tn+l)/p{(Jl - - - <T n )= 

(1 + 2n[a n (Tn-\-i >0])/(2n + 2). Hence we find that Pr(Zi = zi^... ^ Z n = z n )= 
ffK 二 J(1 + ^k[zkZk+i > 0]))/(2 n n!). When n = 3, for example, the eight possible cases 
Z 1 Z 2 Z 3 = 123 ， 123, •. •, 123 occur with probabilities (15, 3,1 ， 5, 5,1 ， 3, 15)/48. 

67. (a) You “always” (with probability 1) make 2 n+1 — (1 + 2 + ... + 2 n ) = 1 dollar. 

(b) Your total payments are X = Xo + Xi + ••- dollars，where X n = 2 n with 

probability 2_ n , otherwise X n = 0. So EX n = 1, and EX = EXo + EXi H - = oo. 

(c) Let (T n ) be a sequence of uniformly random bits; and define the fair sequence 

Y n = (—l) Tn 2 n To … T n _i, or y n = 0 if there is no nth bet. Then Z n = Yo + • • • + • 

[The famous adventurer Casanova lost a fortune in 1754 using this strategy, which 
he called “the martingale” in his autobiography Histoire de ma vie. A similar bet¬ 
ting scheme had been proposed by Nicolas Bernoulli (see R R. de Montmort, Essay 
d"Analyse sur les Jeux de Hazard, second edition (1713)，page 402); and the perplexities 
of (a) and (b) were studied by his cousin Daniel Bernoulli，whose important paper in 
Commentarii Academiae Scientiarum Imperialis Peiropolitame 5 (1731) ， 175—192, has 
caused this scenario to become known as the St. Petersburg paradox.] 

68. (a) Now Z n = Yi + ... + y n , where Y n = (—l) Tn [N > n]. Again Py(Zn = 1) = 1. 

(b) The generating function g(z) equals z(l + g(z) 2 )/2^ since he must win $2 
if the first bet loses. Hence g{z) = (1 — a/1 — z 2 )/z] and the desired probability is 
[z n ] g{z) = (7( n _i)/2[nodd]/2 n 5 where Ck is the Catalan number ( 2 ^) /(A: + 1)- 

(c) Pr(iV > n) = [^] (1 - zg(z))/{l -^) = [^](1 + z)/^T^= (^j J )/2 Ln/2J • 

(d) E N = g (1) = oo. (It’s also Pr(N > n), where Pr(iV > n) 〜 l/^/irn.) 

(e) Let p m = Pr(Z n > —m) for all n > 0. Clearly po = 1/2 and p m = (1 + 
p m _ip m )/2 for m > 0; this recurrence has the solution p m = (m + l)/(m + 2). So the 
answer is l/((m + l)(m + 2)); it’s another probability distribution with infinite mean. 

(f) The generating function g m (z) for the number of times —m is hit satisfies 
9 o(z) = z/{2-z), g m (z) = (l+g m -i(z)g m (z))/2 for m > 0. So g m (z) = h m (z)/h m +i(z) 
for m > 0, where h m (z) = 2m — (2m — 1) 之 ， and ^(1) = 2. [A distribution with finite 
mean! See W. Feller, An Intro, to Probability Theory 2, second edition (1971), XII.2.] 

69. Each permutation of n elements corresponds to a configuration of n + 1 balls in 
the urn. For Method 1, the number of corresponding “red balls” is the position of 
element 1; for Method 2, it is the value in position 1. For example, we’d put 312 4 
into node (2, 3) with respect to Method 1 but into (3, 2) with respect to Method 2. (In 
fact，Methods 1 and 2 construct permutations that are inverses of each other.) 

70. Start with the permutation 1 2 … （c — 1) at the root, and use Method 1 of the 
previous exercise to generate all n!/(c— 1)! permutations in which these elements retain 
that order. A permutation with j in position Pj for 1 < j < c stands for Pj — Pj-i 
balls of color j，where Po = 0 and P c = n + 1; for example, if c = 3, the permutation 
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314 2 would correspond to node (2, 2,1). The resulting tuples (Ai ， •… ， A c )/(n + 1) 
then form a martingale for n = c, c + 1, • • •, uniformly distributed (for each n) among 
all ( c ^ x ) compositions of n + 1 into c positive parts. 

[We can also use this setup to deal with Polyaks two-color model when there are 
r red balls and b black balls at the beginning: Imagine r + b colors, then identify the 
first r of them with red- This model was first studied by D. Blackwell and D. Kendall, 
J. Applied Probability 1 (1964), 284-296.] 

71. If m = r —r and n = b f —b we must move m times to the right and n times to the 
left; there are ( m ^" n ) such paths. Every path occurs with the same probability，because 
the numerators of the fractions are r • (r + 1) • • • • •(〆 _1) .6. (6+1) • • • • • (6’ _ 1) = r m b n in 
some order, and the denominators are (r + 6) • (r + 6 + 1) • • • • •(〆 + 6’ _ 1) = (r+ 6) m+n . 

The answer, ( m + n )r^67(r + 6)reduces to 1/(〆 + 6, 一 1) when r = b = 1. 

72. Since all paths have the same probability, this expected value is the same as 
E(XiX‘ 2 … JC m ), which is obviously l/(m+l). (Thus the X^s are very highly correlated: 
This expected value would be l/2 m if they were independent- Notice that the proba¬ 
bility of an event such as (X 2 = 1,X 5 = 0,X 6 = 1) is E(X 2 (1 - X 5 )X 6 ) = 1/3 - 1/4.) 

[The far-reaching ramifications of such exchangeable random variables are surveyed 
in O- Kallenberg’s book Probabilistic Symmetries and Invariance Principles (2005).] 

73. /(r, n)=r( nJ ^ 1 ) ( r ^) (~^) k qn+i-r+k, where q k = a k /(k-\-l), by induction on r. 

74. Node (r，n + 2 — r) on level n is reached with probability ( r ^ 1 )/nl 5 proportional to 

an Eulerian number (see Section 5.1.3). (Indeed, we can associate the permutations of 
{1, n+1} that have exactly r runs with this node, using Method 1 as in exercise 69.) 

Reference: Communications on Pure and Applied Mathematics 2 (1949), 59-70- 

75. As before, let R n = XoH - \-X n be the number of red balls at level n. Now we have 

E(X n +i |Xo ,..., JC n ) = 1 — R n /{n-\-2). Hence E(i? n +i \R n ) = (n + l)_R n /(n + 2) + 1 ， 
and the definition Z n = (n + l)R n — (n + 2)(n + 1)/2 is a natural choice. 

76. No. For example, let Z 0 = Z f 0 = Y^ and Z x = = X + Y，where X and Y 

are independent with EX = EY = 0. Then E(Zi | Zo) = Zq and E(Zj | Zq) = Z f 0 ^ but 
E(Z 1 -\- Z[\ Zq Zq) = 2(Z 0 + Z f 0 ). (On the other hand, if (Z n ) and (Z f n ) are both 
martingales with respect to some common sequence {X n 、, then (Z n + Z f n ) is also.) 

7T• E(^ n -f.i j Zo ^... 5 = E(E(Z n +i ^ Z n ^ Xq^^ -^"n) | Z。，• • • ， ^ n)，which 

equals E(E(Z n +i | Xo^ • • • ， X n ) \ Zo, • • • ， Z n ) because Z n is a function of Xo, • • • ， X n ] 
and that equals E(Z n | ... ^Z n ) = Z n . (Furthermore (Z n ) is a martingale with 

respect to, say, a constant sequence. But not with respect to every sequence.) 

A similar proof shows that any sequence (Y n ) that is fair with respect to (X n ) 
is also fair with respect to itself- 

78. E(Z n+ i I Vo,, V n ) = E(Z n T4 + i | Vo,..., V n ) = Z n . 

The converse holds with Vo = Zq and V n = Z n /Z n -i for n > 0 ， provided that 
Z n _i = 0 implies Z n = 0, and that we define V n = 1 when that happens. 

79. Z n = VoVi … T4, where Vo = 1 and each V n for n > 0 is independently equal to 
q/p (with probability p) or to p/q (with probability q). Since E(T/ n ) = q+p = 1, (V n ) is 
multiplicatively fair. [See A. de Moivre, The Doctrine of Chances (1718), 102-154.] 

80. (a) True; in fact E(f n (Yo … Y n -i)Y n ) = 0 for any function f n . 

(b) False: For example, let I 5 = 士 1 if K > 0, otherwise I 5 = 0. (Hence 
permutations of a fair sequence needn’t be fair. The statement is, however, true if 
the Y^s are independent with mean zero.) 
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(c) False if m =0 and m = 1 (or if m = 0); otherwise true. (Sequences that 
satisfy E((F ni - EY ni )... (Y nrri - EY n J) = E(Y ni - EY ni ).. .E(Y nm - EY nrri ) are 

called totally uncorrelated. Such sequences, with EY n = 0 for all n，are not always fair; 
but fair sequences are always totally uncorrelated.) 

81. Assuming that Xo, … ， X n can be deduced from Zo, …， we have a n X n + 

— — E(^72+l | ^0^ • • • 5 = E(G72+l^"n+l + &n+l -^-n ^ 0 ， • • • ， ) = 

^n+l + -^n— 1 ) + bn +1 X n for Ti ^ 1 . HGI 1 C 6 Cl n -^i = bn ^ &n+l — dn — = ^n— 1 — ] 

and we have a n = F— n -i ， b n = F _ n _2 by induction, verifying the assumption. 

[See J- B. MacQueen, Annals of Probability 1 (1973) ， 263—271.] 

82. (a) Z n = A n IC n , where A n = 4 — Xi — •. • — X n is the number of aces and 
C n is the number of cards remaining after you’ve seen n cards. Hence EZ n +i = 
(An/Cn)(A n -l)/An/C n ) An/(C n -l) = A n /C n - (In every generalization 
of Polyaks urn for which the nth step adds k n balls of the chosen color, the ratio 
red/ (red + black) is always a martingale, even when k n is negative, as long as enough 
balls of the chosen color remain. This exercise represents the case k n = —1-) 

(b) This is the optional stopping principle in a bounded-time martingale. 

(c) Zn = An/Cn is the probability that an ace will be next. [“Ace Now” is a 
variant of R- Connelly’s game “Say Red ”； see Pallbearers Review 9 (1974) ， 702.] 

83. Z n = ^^ = 1 (X n — EX n ) is a martingale，for which we can study the bounded 

stopping rules min(m ， N) for any m. But Svante Janson suggests a direct computation, 
beginning with the formula S n = ^2^ =1 X n [N > n] where N might be oo: We have 
E(X n [7V > n]) = (EX n )(E[7V > n ]) 5 because [TV > n] is a function of {JC 。， … ， Xn-i }， 
hence independent of X n . And since X n > 0, we have ESn = E(X n [A/">n])= 

Er=i(EX n )E[iV>n] = Zn=i E (( EX n)W>n}) = E 乙 二 (EX 二 ) [iV 》 n], which is 

E ^^ =1 EX n . (The equation might be c oo = oo\) 

[Wald’s original papers，in Annals of Mathematical Statistics 15 (1944) ， 283-296, 
16 (1945), 287—293, solved a somewhat different problem and proved more.] 

84. (a) We have f(Z n ) = f(E(Z n+1 \ Z 0 ,...,Z n )) < E(f(Z n+1 ) \ Z 0 ， ... ， Z n ) by 
Jensen’s inequality- And the latter is E(/(Z n +i) | f(Zo), … ， /(Z n )) as in answer 77. 
[Incidentally ， D. Gilat has shown that every nonnegative submartingale is (\Z n \) for 
some martingale (Z n ); see Annals of Probability 5 (1977) ， 475-481 •] 

(b) Again we get a submartingale, provided that we also have f(x) < f(y) for 
a < x < y < b. [J. L. Doob ，Stochastic Processes (1953) ， 295-296.] 

85. Since (B n /(R n + B n )) = (1 — R n /(R n + B n )) is a martingale by ( 27 )，and since 
f(x)= 1/x is convex for positive {{R n + B n )/B n ) = (R n /B n + 1〉 is a submartingale 
by exercise 84. (A direct proof could also be given.) 

86 . The rule iV n +i (Zo, ••” Z n ) = [max(Zo, … ， Z n ) < x and n + 1 < m] is bounded. 
If max(Zo, ••” Z m _i) < x then we have Zn < x, where N is defined by ( 31 ); similarly, 
if max(Zo, • • •, Z m -i) > x then Zn > x. Hence Pr(max(Zo, •…, Z n ) > x) = (E Zn)/x 
by Markov’s inequality; and E Zn <EZ n in a submartingale. 

87. This is the probability that Z n becomes 3/4, which also is Pr(max(Zo, • • •, Z n ) > 
3/4). But E Z n = 1/2 for all hence ( 33 ) tells us that it is at most (l/2)/(3/4) = 2/3- 

(The exact value can be calculated as in the following exercise. It turns out to be 

T^T=0 (4A-+2K4/C+3) = 1^3/4 - ! 丑 1/2 + | = * 439 0 

88 . (a) We have S > 1/2 if and only if there comes a time when there are more red 
balls than black balls. Since that happens if and only if the process passes through one 
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of the nodes (2,1) ，（ 3, 2), (4, 3), …， the desired probability is pi + p 2 H —— • ， where pk 
is the probability that node (k + 1 , k) is hit before any of (j + 1 ， j) for j < k. 

All paths from the root to (A: + 1 ， A:) are equally likely, and the paths that meet our 
restrictions are equivalent to the paths in 7.2.1.6—( 28 ) • Thus we can use Eq. 7.2 丄 6— ( 23 ) 
to show that pk = l/(2k — 1) — l/(2k); and 1 — 1/2 + 1/3 — 1/4 + ••• = In2. 

(b,c) If pk is the probability of hitting node ((t — 1) A: + 1 ， A:) before any previous 
((t — l)j + 1 ， j)，a similar calculation using the t-ary ballot numbers Cpq yields pk = 
(t — 1)(1/{th — 1) — l/{th)). Then Pk = 1 — (1 — l/t)Hi_i/ t (see Appendix A). 

Notes: We have Pr(S = 1/2) = 1 — In 2, since S is always > 1/2. But we cannot 
claim that Pr(S > 2/3) is the sum of cases that pass through (2,1) ， (4,2) ， ( 6 ,3) ， etc., 
because the supremum might be 2/3 even though the value 2/3 is never reached- Those 
cases occur with probability 7r/\/27 ； hence Pr(S = 2/3) > 27r/\/27 — ln3 .111. A de¬ 
termination of the exact value of Pr(S = 2/3) is beyond the scope of this book，because 
we’ve avoided the complications of measure theory by defining probability only in dis¬ 
crete spaces; we can’t consider a limiting quantity such as 5 to be a random variable, by 
our definitions! But we can assign a probability to the event that max(Zo, Zi， ••” Z n ) > 
x, for any given n and and we can reason about the limits of such probabilities- 

With the help of deeper methods, E. Schulte-Geers and W. Stadje have proved 
that the supremum is reached within n steps, a.s. Hence Pr(S = 2/3) = 27r/\/27 — ln3; 
indeed, Pr(S is rational) = 1, since only rationals are reached; and Pr(5 r = (t — l)/t)= 
(2-3/t)H 1 _ 1/t -(l-2/t)H 1 _2/t-(t-2)/(t-l). [J. Applied Prob. 52 (2015), 180-190.] 

89. Set Y n = X n — p n , a n = — b n = 1 — p n . (Incidentally, exercise 1-2.10-22 gives 
an upper bound for this quantity that has quite a different form.) 

90. (a) Apply Markov’s inequality to Pr(e^ yi+ "' +yn ^ > e tx ). 

(b) e yt < e_ pt (q — y) + e qt (y+p) = e ’ ⑴ + ye g ⑴ because the function e yt is convex. 

(c) We have f(t) = —p + pe t /(q + pe f ) and /’’ ⑷ = pqe t /(q + pe t ) 2 ] hence f ⑼ = 
/’(0) = 0. And f n {t) < 1/4, because the geometric mean of q and pe t ^ (pge” 1 ’ 2 , is less 
than or equal to the arithmetic mean, (q + pe t )/2. 

(d) Set c = b — a，p = —a/c^ q = 6 /c, Y = Y/c^ t = ct^ h(t) = y( c ’)/c. 

(e) In E((e c i t2 / 4 -f- Y\hi (t))... (e c n t2 / 4 + Y n h n (t))) the terms involving hk(t) all 
drop out, because (Y n ) is fair. So we’re left with the constant term, e ct2 / 4 . 

(f) Let t = 2x/c^ to make ct 2 /4 — xt = —x 2 /c. 

91* E(^V^+i •) ••” ) = E(E(Q I -X^o^ ••” ^n+i) I ^o， ••” )i and this is equal 

to E(Q I 為， … ， Xn) by formula (*) in answer 35 - Apply exercise 77- 

92. Qo = EX m = 1/2. If n < m we have Q n = E(X m | Xo, … ， Xn), which is the 

same as E(X n +i | Xo,..., X n ) (see exercise 72); and this is (1 + Xi H - hX n )/(n + 2), 

which is the same as Z n in ( 27 ). If n > m ， however，we have Q n = X m . 

93. Everything goes through exactly as before, except that we must replace the quan¬ 
tity (m — l) t /m t ~ 1 by the generalized expected value，which is nl=i(l — Pn/c)- 

94. If the X^s are dependent, the Doob martingale still is well defined; but when 

we write its fair sequence as an average of A(xi^... ^ xt) there is no longer a nice 
formula such as ( 40 ). In any formula for A that has the form ^2 x Px(Q( • • • • • •)— 

Pr(J^C n = Xn ， 义 n +1 = Xn-\-l ，•••）/ (Pr(J^ n = = ， • • •）) 

must equal Px^ so it must be independent of x n . Thus ( 41 ) can’t be used. 

95. False; the probability of only one red ball at level n is l/(n+l) = Q(n _1 ). But there 
are a.s. more than 100 red balls, because that happens with probability (n —99)/(n+1). 
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96• Exercise 1.2.10-21, with en equal to the bound on |X — n/2|, tells us that (i) is 
q.s. and that (i) ，（ ii) ， (iii) are a.s. To prove that (iv) isn’t a.s.，we can use Stirling’s 
approximation to show that ( n /2±/c )!^ U ㊀( 几一 when k = \fn\ consequently 

Pr(|X| < y/n) = ©(!)• A similar calculation shows that (ii) isn’t q.s. 


97. We need to show only that a single bin q.s. receives that many. The probability 
generating function for the number of items H that appear in any particular bin is 


G(z) = ((n — 1 + z)/n) A , where N = [n 1+s If r = ^n s ^ we have 


Pr(H <r) < 



G 



2 


2nr 


2n 


<2 r 


2nr — 


2n 


< 2 


+1 e- 


by 1.2.10-(24) • And if r = 2n s we have 


Pr(H >r)< 2~ r G{2) = 2 


[nr/2J 


H — 

n 


< 2 


r/2 


H — 

n 


< 2 


e 


/2 


by 1.2.10—(25). Both are exponentially small- [See Knuth ， Motwani，and Pittel ， 
Random Structures & Algorithms 1 (1990), 1—14, Lemma 1.] 

98. Let E n = E_R, where R is the number of reduction steps; and suppose F{n) = k 
with probability pk, where P/c = 1 and kpk = ^ > (The values of pi, 

• •” p n , and g might be different，in general, every time we compute F(n).) 

Let Eg = Yl b j=a 1 /^j • Clearly Eo = 0. And if n > 0, we have by induction 


E 


1 = l + - K-k+l) 


k 


k 


k 


+ 1 — y^p/c^n-fc+l < Si + 1 — ^ 


k 


Pk 


k 


k 


9n 


< Hi- 


[See R. M* Karp ， E. Upfal，and A. Wigderson ， J. Comp, and Syst. Sci. 36 (1988) ， 252.] 


99. The same proof would work, provided that induction could be justified, if we were 
to do the sums from k = —00 to n and define = — X^j=6+i '1 when a > b. (For 
example, that definition gives —EJ^ +3 = l/g n +i + l/g n ^2 < 2/g n .) 

And in fact it does become a proof, by induction on m，that we have E m ^ n < SJ 
for all m，n 2 0， where E m , n = Emin(m, R). Indeed，we have Eo， n = 丑 m +i，o = 0; 
and E m +' n = 1 + S^=-oc Vk^m,n-k when n > 0 - [This problem is exercise 1.6 in 
Randomized Algorithms by Motwani and Raghavan (1995) - Svante Janson observes 
that the random variable Z m = + min(m ， R) is a supermartingale, where X m is 

the value of X after m iterations, as a consequence of this proof-] 

100. (a) J2T=i ^P k — Emin(m,T) = pi + 2p‘2 + .. . + rnp m + mp m +i + .. . + mpoo <ET. 
(b) Emin(m, T 7 ) > mpoo for all m. (We assume that 00 • p = (p > 0? 00: 0). 

101. (Solution by Svante Janson.) If 0 < t < min(pi ， •… ， p m ) = p，we have Ee tx = 

Ylk=i EetXk = Ylk=i Pk/(e~ t -l-\-p k ) < HT=iPk/(Pk -t), because -1 > -t. By 
1.3.10—(25) ， therefore, and setting t = 0/ Pr(X > rfi) < e~ rtv ， YYk=i Vk / (pk — t)= 
exp(-r6> - XX =1 ln(l - t/p k )) < exp(-r6» - 亡【 =1 ( 彳外 ) ln(l - 0)/0) = exp(-r6» - 
ln(l — 沒 ))• Choose 6 = (r — l)/r to get the desired bound re l ~ v . (The bound is nearly 
sharp when m = 1 and p is small，since Pr(X > r/p) = (1 — ^ e ~ r .) 

102 . Applying exercise 101 with fi < si + • • • + 5 m and r = Inn gives probability 
0(n 一 1 log n) that (si + … + s m )r trials aren’t enough. And if r = /(n) lnn，where 
f(n) is any increasing function that is unbounded as n —> oo, the probability that SkT 
trials don’t obtain coupon k is superpolynomially small. So is the probability that any 
one of a polynomial number of such failures will occur. 


probability generating function 

Knuth 

Motwani 

Pittel 

Karp 

Upfal 

Wigderson 

Motwani 

Raghavan 

Janson 

supermartingale 

Janson 

superpolynomially small 


October 3, 2015 



MPR 


ANSWERS TO EXERCISES 


41 


103. (a) The recurrence p 0ij = [i = j], p (n+1)ij = Y,l=oPmk([fo(k) = j] + [/i (k) =j])/2 
leads to generating functions gij = ^^LoPmjZ 71 that satisfy gio = [i = 0] + ( 仍 0+ 仍 1) 么 /2, 
gn = [i = 1] + (gio + 仍 2) 之 /2, 仏 2 = [i = 2] + (gn + Qi2)z/2. From the solution gio = 

+ gn = A — 2B ， gi2= A + B — C^ A = ^/{l — z)^ B = ■!(]_ —3[i = 1])/(1 + 之 /2 )， 
and C = \([i = 0] — [i = 2])/(l — z/2)^ we conclude that the probability is | + 0(2 - n ); 
in fact it is always either [2 n /3 」 /2 n or [2 n /3]/2 n . The former occurs if and only if 
i ★ j and n is even, or i + j = 2 and n is odd- 

(b) Letting 分 012 = f (^ooi +9112) ， 分 001 = f ([j = 0 ] + poii) ， etc.，yields the 
generating function ^012 = ([j — 1 ] + [j = l]z)z 2 /(4 — z 2 ). Hence each j occurs with 
probability 1 / 3 , and the generating function for N is z 2 /(2 — 2；); mean = 3 , variance = 2 . 

(c) Now 分 001 = f ([j = 0] + 分 112) ， etc.; the output is never 1; 0 and 2 are equally 
likely; and N has the same distribution as before. 

(d) Functional composition isn’t commutative, so the stopping criterion is differ¬ 
ent: In the second case, 111 cannot occur unless the previous step had 000 or 222. The 
crucial difference is that，without stopping, process (b) becomes fixed at coalescence; 
process (c) continues to change aoaia‘2 as n increases (although all three remain equal). 

(e) If T is even ， sub(T) returns (—1,0,1 ， 2) with probability (2, (2 T — 1)/3, 
(2 t — 4)/3, (2 t — 1)/3)/2 t . Thus the supposed alternative to (b) will output 0 with 

probability * + 蠢 + + • • • — § 2&+ 1 (2 2 — l)/2 2 + ^ 0.427, not 1/3. 

(f) Change sub(T) to use consistent bits Xt, Xt-i^ • • •, Xi instead of generating 
new random bits X each time; then the method of (b) is faithfully simulated. (The 
necessary consistency can be achieved by carefully resetting the seed of a suitable 
random number generator at appropriate times.) 

[The technique of (f) is called “coupling from the past” in a monotone Monte Carlo 
simulation. It can be used to generate uniformly random objects of many important 
kinds, and it runs substantially faster than method (b) when there are thousands or 
millions of possible states instead of just three. See J. G. Propp and D. B. Wilson, 
Random Structures Algorithms 9 (1996) ， 223-252.] 

104. Let q = 1 — p. The probability of output (0, 1 ， 2) in (b) is (g 2 , 2pq^p 2 ); in (c) it is 
(p 2 +pq 2 ^ 0, q 2 +qp 2 )- In both cases N has generating function (l—pq(2—z))z 2 /(l—pqz 2 )^ 
mean 3/(1 — pq) — 1, variance (5 — 2pq)pq/{1 — pq) 2 . 

105. Suppose n = 2m is even. Experiments for small m suggest that there are 

polynomials tk such that g a = z a t m _ a /t m for 0 < a < m; and indeed, the polyno¬ 
mials defined by to = h = 1, tk+i = 2^ — z 2 tk-i fill the bill ， because they make 
g m = zgm-i. The generating function T\w) = Sm=o = {l — w)/(l — 2w + W 2 z 2 ) 

now shows，after differentiation by 么 ， that we have f m (l) = —m(m — 1) and ^^(1)= 
(m 2 — 5m + 3)m(m — 1)/3; hence t 匕⑴ + ⑴ — t’ m (l) 2 = |(m 2 — m 4 ). The mean 

and variance, given a，are therefore a — (m — a) (m — a — 1) + m{m — 1) = a(n — a) and 
|(m — a) 2 — (m — a) 4 — m 2 + m 4 = |(n 2 — 2a(n — a) — 2)a(n — a) ， respectively- 

When n = 2m — 1 we can write g a = z a Um- a /u m for 0 < a < with = 

2u m — z 2 u m ^i. In this case we want uo = 1 and u\ = 2：, so that g m = g m -i. From 
U(w) = u m w m = (1 + (之 一 2)w )/(I -2w-\- w 2 z 2 ) we deduce ^(1) = —m(m — 2) 

and ^(1) = m(m — l)(m 2 — 7m + 7)/3. It follows that，also in this case, the mean 
number of steps in the walk is a{n — a) and the variance is \{n 2 — 2a(n — a) —2)a(n — a). 

[The polynomials t m and u m in this analysis are disguised relatives of the classical 
Chebyshev polynomials defined by T m (cos 0) = cos m6^ U m (cos6) = sin(m + l)^/sin 0. 
Let us also write Vm(cos^) = cos(m — |)^/cos Then V m (x) = (2 — l/x)T m {x) + 
{l/x - and we have t m = z m T m (l/z), u m = z m V m {l/z).] 
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106. Before coalescing, the array aoai... ad — 1 always has the form a r (a+1) … (b—l)b s 
for some 0<a<6<d ， r>0, and s > 0, where r + 5 + 6 — a = d+l- Initially a = 0, 
b = d — 1，r = s = 1. The behavior of the algorithm while r+5 = t is like a random walk 
on the t-cycle, as in the previous exercise, starting at a = 1. Let Gt be the generating 
function for that problem，which has mean t — 1 and variance 2( 墓 ) • Then this problem 

has the generating function G 2 G 3 ... Gd\ so its mean is 众 —1)=(《)，and the 

variance is = 2 CT). 

107. (a) If the probabilities can be renumbered so that pi < q\ and P 2 < 奶 ， the 
five events of Q can have probabilities pi, qi — pi, q 2 — i>2, and 奶 ， because P 3 = 
(qi — pi) + (q 2 — p2) + q 3 . But if that doesn’t work，we can suppose that pi < ^1 < 
q 2 < q3 < P 2 < P3- Then pi, qi - pi, pi +P2 — Qi ， P3 — 奶 ， and q 3 are nonnegative. 

(b) Give $Ts events the probabilities 告，吾，吾， 吾 . 

(c) For example, let pi = ^ p 2 = P3 = •，qi = q ‘2 = ^3 = 

108. Let pk = Pr(X = k) and qk = = k). The set U n {S/c<n Pk^k<n ^ 1 

divides the unit interval [0 . • 1) into countably many subintervals, which we take as the 
set Q of atomic events uj. Let X(u) = n if and only if a; C [^2 k<n Pk - - ^2k<nP k )^ a 
similar definition works for Y(co). And X(u) < Y(u) for all u. 

109. (a) We’re given that pi +P3 < qi + q3^ P 2 + P3 < q 2 + ^3, and P3 < ^3 - (Also 

that 0 < 0 and pi + p 2 + p 3 < + q ：2 + qz\ but those inequalities always hold.) We 

must find a coupling with pv 2 = P 21 = P3i = P32 = 0, because 1 2 2, 2 2 1, 3 2 1, and 
3 2 2. In the previous problem we were given that p 2 + P3 < Q 2 + qz and p 3 < qz, and 
we had to find a coupling with p 2 i = p3i = P32 = 0. 

(b) Let = {x \ x y a for some a G A} and = {x \ x ^ b for some b G B}. 

We’re given that Pr r (X< Pr’’（YG for all A. Let A = {1,..., n} \B 丄 ， so that 
Pr’(X G B^) = 1 — Pr^X G A). The result follows because A = . 

(c) Remove all arcs — > Xj from the network when i ^ j. Then a blocking 

pair (I, J) has the property that i -< j implies i G / or j G J. Let A = {x \ x ^ a 
for some a ^ J} and B = {1，• • • ， n} \ A. Then A C 1^ B C and B = . Hence 

^2iel SjGJ — ^ieA Vi + YjjeBqj k ^ ■*" w = 

[See K. Nawrotzki ，Mathematische Nachrichten 24 (1962), 193-200; V- Strassen, 

Annals of Mathematical Statistics 36 (1965) ， 423-439.] 

110. (a) The result is trivial if r = 1 - Otherwise consider the probability distributions 

Pk = (Pk - r k )/(^-r) and q k = (q k - r fc )/(l -r); use the coupling pij = (1 - + 

rj [i= j]. [See W. Doeblin, Revue mathematique de 1'Union Interbalkanique 2 (1938 )， 
77-105; R. L. Dobrushin, Teoriya Veroyatnostei i ee Primeneniia 15 (1970), 469-497.] 

(b) Yes, because the {p ^q) distribution satisfies the hypotheses of that exercise. 

111. (a) Here are the 60 triples 1 丌 3 丌 4 丌 , with the minima in bold type: 

134 163 123 126 142 142 153 145 163 154 245 234 534 563 623 526 632 652 534 643 

356 645 246 234 435 463 524 423 642 532 461 351 361 641 251 231 341 531 321 421 

512 412 415 315 316 615 216 216 415 316 623 526 652 452 564 354 465 364 256 265 

(b) Both Sa and Sb lie in A\J B. Each element of AU B is equally likely to have 

the minimum value an; exactly | AflB| of those elements have that value as their sketch. 

(c) |AnBnq/|AuBuq. 

Notes: The ratio |AflB|/|AUB| is a useful measure of similarity called the Jac- 
card index, because Paul Jaccard used it to compare different Swiss sites according to 
the sets of plant species seen at each place [Bulletin de la Societe Vaudoise des Sciences 
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Naturelles 37 (1901), 249]. It is commonly used today to rank the similarity between 
web pages, based on a certain set of words in each page. 

Minwise independence was introduced by Andrei Broder for that application in 
1997, using n = 2 64 and a method of identifying roughly 1000 words A on a typical web 
page- By calculating, say, independent sketches Si(A)^ … ， ^100(^) for each page，the 
number of j such that Sj (A) = Sj (B) gives a highly reliable and quickly computable 
estimate of the Jaccard index. A perfectly minwise independent family is impossible 
in practice when n is huge, but the associated theory has suggested approximate 
“minhash” algorithms that work well. See A* Z. Broder, M. Charikar ， A. M. Frieze, 
and M. Mitzenmacher, J. Computer and System Sciences 60 (2000) ， 630-659. 


112. (a) Such a rule breaks ties properly, provided that the number of 7r with oo^s in B 
is a multiple of n — m. Each B can have its own rule. 

(b) In fact we can produce families whose permutations are all obtained from 
N/n = d “seeds” by cyclic shifts, as in exercise 111 . Begin with m = 1 and a table 
of N = lcm(l ， 2, •…， n) partial permutations whose entries 7Tij for 1 < i < TV and 
1 < j < n are entirely blank，except that TUj = 1 for each pair ij with (j — l)d < i < jd 
and I < j <n. When n = 4, for instance, the initial tableau 


luuu luuu luuu uluu uluu uluu uulu uulu uulu UULjl uuul uuul 


represents N = 12 truncated permutations with m = 1. "We’ll insert some 2s next. 

Let A be a subset of size n — m that is all blank, in some tt. Each A oc¬ 
curs equally often (as in uniform probing, Section 6.4); so the number of such tt is 
N / L:m) • Fortunately this is a multiple of n — m，because exercise 1.2.6-48 tells us 

that iV/((n - m)( n ： J)= N / {n-m + k). 

Take n — m such 丌 and insert m+1 into different positions within them. Then find 
another such A, if possible, and repeat the process until no blank subsets of size n — m 
remain. Then set m m + 1, and continue in the same way until m = n. 

It’s not hard to see that the insertions can be done so that 7Tj, tt 抖 j, … ，丌 ( n —i) 斜 j 
are maintained as cyclic shifts of each other. When n = 4 the 2s are essentially forced: 


12 


lu2u luu2 ul2u ulu2 21 


12 2 U 1 


21 


But then there are two ways to fill the two cases with A = {3, 4}: 
123 u 1 U 2 U 13 u 2 u 123 u l u 2 21 u 3 3 U 12 2 U 1 U u 213 


Adopting the first of these leads to two ways to fill A = {2,4}: 


123 

123 


132 u 13 u 2 
l u 23 13 u 2 


u 

u 


123 


2uul 

• 

u2l)1 

uu21 

• 

23 u l 

u2lj1 

3 U 21 

2 U 31 

u2yl 

3 U 21 

23 u l 

32 u l 

3 U 21 

23 u l 

u 231 

3 U 21 


Here A is a cyclic shift of itself, but consistent placement is always possible. 

[See Yoshinori Takei, Toshiya Itoh，and Takahiro Shinozaki ， IEICE Transactions 
on Fundamentals E83-A (2000), 646-655, 747-755-] 


113. (a) The probability is zero if / > A: or r > n — A：- Otherwise the result follows if we 
can prove it in the “complete” case when I = k — 1 and r = n — because we can sum 
the probabilities of complete cases over all ways to specify which of the unconstrained 
elements are < k and which are > k. 


To prove the complete case, we may assume that ai = i, b = k, and Cj = k + j 


for 1 < i < 


k 


and 1 < j < r = n — k. The probability can be computed 
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via the principle of inclusion and exclusion, because we know Pr(min ae A air = ki :)= 
l/{n — k + t) = Pb whenever A = {A:，• • • ， n} U B and B consists of t elements less 
than k. For example, if k = 4 the probability that 4 丌 = 4 and {1 丌， 2 丌 , 3 丌 } = {1, 2, 3} 
is P 0 -P { 1 } -P{ 2 } -P{3} +P{1,2} +P{1,3} +P{2,3} -P{1 ， 2,3}; each of those probabilities 
is correct for truly random tt. 

(b) This event is the disjoint union of complete events of type (a)- [See A. Z. 
Broder and M. Mitzenmacher, Random Structures & Algorithms 18 (2001 )， 18-30-] 
Notes: The function = ln(lcm(l ， 2,… ， n)) = ^2 p k < n [p prime] In p was 

introduced by R L. Chebyshev [see J. de mathematiques pures et appliquees 17 (1852 )， 
366—390]，who proved that it is 0(n)- Refinements by C.-J- de la Vallee Poussin 
[Annales de la Societe Scientifique de Bruxelles 20 (1896), 183-256] showed that in fact 
= n + 0(ne _c;losn ) for some positive constant C. Thus 1cm(1, 2, • • • ， n) grows 
roughly as e n ，and we cannot hope to generate a list of minwise independent permuta¬ 
tions when n is large; the length of such a list is 232,792,560 already for 19 < n < 22. 

114. First assume that \Sj\ = dj +1 for all j，and let Qj{x) = Y\{(x — s) | s G Sj}. We 

can replace x， +1 by gj(Xj), without changing the value of f(xi, •… ， x n ), when Xj G Sj. 
Doing this repeatedly until every term of f has degree < dj in each variable Xj will 
produce a polynomial that has at least one nonroot in x • • • x according to 
exercise 4.6.1—16. [See N* Alon, Combinatorics, Probab. and Comput. 8 (1999), 7-29.] 

Now in general, if there were at most \Si \ + … + |iS n | — (di + …+ d n + n) 
nonroots, we could find subsets Sj C Sj with \Sj\ = dj + 1 such that Sj differs from Xj 
in \Sj \ — dj — 1 of the nonroots and S[ x • • • x S f n avoids them all — a contradiction. 

(This inequality also implies stronger lower bounds when the sets Sj are large. 
If, for example, di =• - = d n = d and if each \Sj\ > where 5 = d + 1 + \d /(n — 1)], 
we can decrease each | Sj | to s and increase the right-hand side. For further asymptotic 
improvements see Bela Bollobds, Extremal Graph Theory (1978) ， §6.2 and §6.3.) 

115. Representing the vertex in row x and column y by (x,y), if all points could be 

covered we’d have f(x,y)= ]Yj=i( x - a j) I[ q j=i(y- b j) IVj=i( x ^y+ c j)l x -y^ d j) = 
for all 1 < x < m and 1 < ?/ < n and for some choices of bj, Cj ， dj. But f has 

degree p + g + 2r = m + n — 2, and the coefficient of x m ~ 1 y n ~ 1 is 士 ( 卜 〉 2 」) — 

116. Let g v = ^2{x e | v G e} for each vertex including x e twice if e is a loop from 
v to itself. Apply the nullstellensatz with / = Yl v (l — Qv~ l ) — ri e (l — x e) and with 
each Sj = {0,1}，using mod p arithmetic- This polynomial has degree m，the number 
of edges and variables, because the first product has degree (p — l)n < m; and the 
coefficient of Y[ e x e is (—l) m ^ 0. Hence there is a solution x that makes f(x) nonzero. 
The subgraph consisting of all edges with = 1 in this solution is nonempty and 
satisfies the desired condition, because g v (x) modp = 0 for all v. 

(This proof works also if we consider that a loop contributes just 1 to the degree- 
See N. Alon ， S. Friedland，and G. Kalai ， J. Combinatorial Theory B37 (1984) ， 79-91.) 

117. If a; = e 2Wm , we have Eco jX = C)/(l - p) n — k cu jk = (co j p + 1 - p) n . 

Also Ico^p + 1 — p| 2 = p 2 + (1 — p) 2 +p(l — p){uj^ + o; 一 = 1 — 4p(l — p) sin 2 (7rj/m). 
Now sin 7：t > 2t for 0 < t < 1/2. Hence, if 0 < j < m/2 we have \co j p + 1 — p\ 2 < 
1 — 16p(l —p)j 2 /m 2 < exp(—16p(l — p)j. 2 /m 2 ); if m/2 < j < m we have sin(7rj7m)= 
sin(7r(m-j)/m). Thus \ E X \ < exp(-8p(l -p)j 2 n/m 2 ). 

The result follows, since Pr(X modm = r) = ^ Ea; jX . [S. Janson 

and D. E. Knuth ，Random Structures & Algorithms 10 (1997) ， 130-131.] 
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118 . Indeed ， ( 22 ) with Y = X — x yields more (when we also apply exercise 47): 


Pr(X >x)> Pr(X >x)> 


{EX -xf _ {EX -xf 
E(X - x) 2 = EX 2 -x(2EX-x) 


^ (EX -x) 2 ^ (EX -x) 2 

- EX 2 -xEX - EX 2 -x 2 


(The attribution of this result to Paley and Zygmund is somewhat dubious- They did ， 
however, write an important series of papers [Proc. Cambridge Philosophical Society 26 
(1930), 337—357 ， 458-474; 28 (1932), 190-205] in which a related inequality appeared 
in the proof of Lemma 19.) 

119. Let = Pr(U < V < W and V < (1 - t)U-h tW), g(x,t) = Pr(U <W <V 

and W < {I-t)U^r tV), h(x,t) = Pr(W < U < V smd U < (1_- t)WtV). We want 
to prove that f(x,t)+ g(x^ t) + h(x ， t) = L Notice that，if U =1 — U^V=1 — 

W =1-W, we_h_ave Pt(W < t/ < F and f/ > (1 - t)WtV) = Pr(V < F < W and 
U < tV + — Hence f — h(x,t) = /(l —x，1 — 尤)， and we may assume that t < x. 

Clearly g(x,t) = t(v — u)= 专 . And t < x implies that 

/(M) = / ( ：_, )/(1 _,) f JT 一 ㈣ ^(l-(v-(l-t)u)/t) = t 2 (l- x) 2 /(6(l-t)x); 
h(x,t) = Jl + C iMn)) = f -/OM). 


Paley 

Zygmund 

Terpai 

conditional distribution 
Volkov 

Jabbour-Hattab 
Jensen’s inequality 
strictly convex 
uniform distribution 
conditional probability 


Instead of this elaborate calculation, Tam as Terpai has found a much simpler 
proof: Let A = M = (UVW), and Z = max([7, F, W). Then the 

conditional distribution of M, given A and Z, is a mixture of three distributions: 
Either A = U^ Z = V^ and M is uniform in [A.. Z]; or A = Z = W^ and M is 
uniform in [x .. Z\\ or A = W^ Z = V^ and M is uniform in [A .. x]. (These three cases 
occur with respective probabilities (Z — Z — x^x — A)/(2Z — 2A )， but we don’t need 
to know that detail-) The overall distribution of M, being an average of conditional 
uniform distributions over dll A < x and Z > is therefore uniform. 

[See S. Volkov, Random Struct. & Algorithms 43 (2013), 115—130，Theorem 5-] 

120 . See J- Jabbour-Hattab, Random Structures & Algorithms 19 (2001), 112-127 - 

121 . (a) DQ/IW = I lg f + 咅 lg ! « .0097; D ㈤ |") = i lg f + ! lg f « .0098. 

(b) We have E(p(X)\gp(X)) > (Ep(X)) lgEp(X) by Jensen’s inequality ( 20 ); 
and Ep(X) = y{t) = 1, so the logarithm evaluates to 0- 

The question about zero is the hard part of this exercise- We need to observe that 
the function f(x) = x lgx is strictly convex, in the sense that equality holds in ( 19 ) only 
when x = y. Thus we have (E Z) lg EZ = E(Zlg Z) for a positive random variable Z 
only when Z is constant. Consequently D(y\\x) = 0 if and only if x(t) = y(t) for all t. 

(c) Let x{t) = x{t)/p and y{t) = y{t)/q be the distributions of X and Y within T. 

Then 0 < D(y\\x) = k (⑽ = E(lg p(Y) \ Y G T)\g(p/q). 

(d) D(y\\x) = (Elg m) — Hy = lgm — Hy - (Hence, by (b)，the maximum entropy 
of any such random variable Y is lgm，attainable only with the uniform distribution.) 

(e) Ix,Y = -Hz - + En x ^ u ) W) + 

Yjv y( v )^(^/y( v ))^ because Y, v z ( u ^ v ) = x ( u ) and Y, u z ( u ^ v ) = y( v )- (One can also 
write Ix，y = H Y - Hy\x, where H Y \x = x(t)ff Y 卜 ) 

122 . (a) D(y\\x) = k(^ t+1 ) = lg i « 0.755; D(x\\y) = lg 卜 0.415. 
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(b) Let q = 1 — p and t = pn + U\Jn. Then we have 


y(t) 


-u 2 /(2pq) 

J 2irpqn 


exp 


u u u 3 u 3 \ 1 

2q 2p ^ 6 p 2 6q 2 ) yjn 


(i) 


lnp ⑷ 




丄 , u 、丄 

2 ln ^l^~67J^ + 


s. 


By restricting \u\ <n e and trading tails (see 7.2.1.5—( 20 )), we obtain 


D (y\\x) 


27Vpqn J - 



-u 2 /(2pq) 


2q In 2 


lgq) duyjn-\r 


21 n 2 


1)0 


In this case D{x\\y) is trivially 00 , because x{n + 1) > 0 but y{n + 1) 


123 . Since p k+ i = Pky{t)/z k {t) we have p{t) = (1 - Pk)pk+i/(pk(^- - P/c+i))- [This 
relation was the original motivation that led S- Kullback and R. E* Leibler to define 
D{y\\x)^ in Annals of Mathematical Statistics 22 (1951) ， 79-86.] 

124. Let m = C 2 2 D ⑷ I®) and g(t) = f{t)[p(t) < m]; thus g(t) = f(t) except with prob¬ 
ability A c . Wehave|E(/)-£； n (/)| = ( 五 (/)-& ⑹ )+ | 丑⑹ - E n ( P )| + (E n (/)-E n ⑷). 
The Cauchy-Schwarz inequality (exercise 1-2.3-30) implies that the first and last are 
bounded by 11 /| I \/A 7 , because f(t) — g(t) = f(t)[p(t) > m]. 

Now var(p(X)^(X)) < E(p(X) 2 g(X) 2 ) < mE(p(X)f(X) 2 ) = mE(f(Y) 2 )= 
m||/|| 2 . Hence (E(g) - E n (g)) 2 = var E n (g) = var(p(X)p(X))/n < ||/|| 2 /c 2 . 

Consider now the case c < 1. From Markov’s inequality we have Pr(p(X) > m) < 
(E p(X))/m = 1/m. Also E(p(X)[p(X) < m}) = E[p(Y) <m] = 1 — A c . Consequently 
Pr(E n (l) > a) < Pr(maxi< fc < n p(Xa.) > m) + Pr (^^ =1 p(X k )[p(X k ) <m] > na) < 
n/m-\-E(J2l =1 p(X k )[p(X k ) <m])/(na) = c 2 + (1 - A c )/a. 

[S. Chatterjee and P. Diaconis，preprint (September 2015)-] 
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He writes indexes to perfection. 
— OLIVER GOLDSMITH, Citizen of the World (1762) 

When an index entry refers to a page containing a relevant exercise, see also the answer to 
that exercise for further information. An answer page is not indexed here unless it refers to a 
topic not included in the statement of the exercise. 


7 (Euler’s constant), 29. 
iy(x) (sideways sum), 13- 
7r (circle ratio), 14. 

0 (golden ratio), 12, 26- 

A priori versus a posteriori probabilities, 25. 
a.s -： Almost surely, 11-12, 20, 21, 39, 40. 

Ace Now, 8, 19. 

Ahlswede, Rudolph, 34. 

Aldous, David John, 33. 

Almost sure events, 11 • 

Alon，Noga m”) ， 44. 

Analysis of algorithms ， 9, 21, 22. 
Arithmetic-geometric mean inequality, 

31 ， 39. 

Asymptotic methods, 11, 12, 16. 

Atomic events, 1. 

Azuma，Kazuoki (吾妻 一 興 ) ， 9, 20. 

B(jn” • • ， Pm )，see Multivariate Bernoulli 
distribution. 

B n (p) 5 see Binomial distribution. 

see Cumulative binomial 
distribution. 

Backward versus forward, 21. 

Ballot numbers, 39. 

Bayes, Thomas, 14* 

BDD (binary decision diagram), 5, 34. 

Bell, Eric Temple, numbers zu n , 15. 
Bernoulli, Daniel, 36- 
Bernoulli, Jacques (= Jakob = James), 
distribution, multivariate, 14, 18, 20. 
Bernoulli，Nicolas (= Nikolaus), 36. 

Beta distribution, 14. 

Bhatia，Rajendra (TT^^ 28. 

Bienayme, Irenee Jules, inequality, 4. 
Bin-packing problem, 11, 20. 

Binary notation, 14. 

Binary random variables, 2, 3, 5, 13—15, 20. 
Binary search trees, 24. 

Bingo, 12-13. 

Binomial distribution, 14, 24, 32. 

cumulative, 14—15, 31- 
Bit vectors, 3, 9, 13-14. 

Bits of information, 24. 

Blackwell, David Harold, 37- 
Bollobas, Bela, 44. 


Boolean functions, 5, 15, 33, 35- 
dual of, 33- 
monotone, 5, 35. 
symmetric, 16. 

Boolean random variables, see Binary 
random variables. 

Boolean vectors, see Bit vectors. 

Boyd, Stephen Poythress, 33. 

Bracket notation, 2. 

Bracketing property, 34. 

Broder，Andrei Zary ， 43, 44. 

Brown, John O’Connor, 26. 

Cantelli，Francesco Paolo, inequality, 33. 
Casanova de Seingalt, Giacomo 
Girolamo, 36. 

Catalan, Eugene Charles, numbers, 36. 
Cauchy, Augustin Louis, inequality, 46- 
Chain rule for conditional probability, 

14, 28. ^ 

Charikar, Moses Samson (h!M 
1 0>K), 43. 

Chatterjee, Sourav (aW) 46- 

Chebyshev (= Tschebyscheff)，Pafnutii 
Lvovich (HeGMineBi, riac^HyTifi 
JlBBOBHHTb = He&blineB ， na<J)HyTHH 

JIkbobhh), 33, 44. 
inequality, 4, 9, 16. 
monotonic inequality, 35. 
polynomials, 27, 41. 

Chesterton, Gilbert Keith, 26. 

Chicks, 15. 

Circle ratio (7r), 14. 

Cliques, 16-17- 

CMath: Concrete Mathematics^ a book 
by R. L. Graham, D. E. Knuth, and 
O. Patashnik, 27, 29. 

Coalescing random walk, 21. 

Coin tosses, 11-12, 19, 20. 

Column sums, 22. 

Combinatorial nullstellensatz, 23. 
Commutative law, 41. 

Compositions, 37- 
Concave functions ， 4, 32. 

Conditional distribution, 3, 45. 
Conditional expectation, 2—3, 15-19. 
inequality, 5, 16, 34. 

Conditional probability, 1, 13—14, 35, 45- 
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Connelly, Robert, Jr 。 38. 

Convex combinations, 33. 

Convex functions ， 4, 8 ， 16, 20, 33, 38, 39. 
strictly, 45- 

Correlated random variables, 17—18, 28, 37. 
Correlation inequalities, 17- 
Coupling, 22. 

from the past, 41. 

Covariance, 2 ， 14 ， 17, 28, 35. 

Cover, Thomas Merrill^ 13, 28. 

Covering all points, 23. 

Cumulative binomial distribution, 14—15, 31. 
Cycle graph (C n ) ， 22, 41. 

Darwin, Charles Robert, 25- 
Davis, Chandler, 28. 

Day kin, David Edward, 34. 
de La Vallee Poussin, Charles Jean 
Gustave Nicolas, 44. 
de Moivre, Abraham, 37. 
martingale, 19. 

de Montmort, Pierre Remond, 36- 
Density, relative, 24* 

Degree of a multivariate polynomial ， 23. 
Diaconis, Persi Warren, iv, 46- 
Diagonal lines, 23- 
Dice, 12, 24. 

Discrete probabilities, L 
Doblin, Wolfgang (= Doeblin ， Vincent )， 42* 
Dobrushin, Roland LVovich (^UpSpynmH, 
PojiaH^ JIbbobhh), 42. 

Doob，Joseph Leo, 6 , 9, 38. 

martingales, 9-10, 20, 37, 39. 

Dual of a Boolean function, 33- 

Eggenberger, Florian, 6 - 
Elton, John Hancock, 31- 
Entropy, 24. 

relative, 24. 

Enveloping series, 34, 

Erdos, Pal (= Paul), 34. 

Etesami, Omid ( 1 iv. 

Euler，Leonhard (Eibiepi, JleoHap^ = 
9fijiep, JleoHapA), constant 7 , 29. 
Eulerian numbers, 37. 

Events, 1-3. 

Exchangeable random variables, 37- 
Expected value, 2—5, 14—16, see also 
Conditional expectation. 

Fair sequences, 7, 10, 19, 38. 

with respect to a sequence, 7, 37. 

Families of sets, 17- 
Feige, Uriel twniN), 3L 

Feller, Willibald (= Vilim = Willy = 
William), 36. 

Fibonacci, Leonardo, of Pisa (= Leonardo 
filio Bonacii Pisano), dice, 12. 
martingale, 19. 
numbers, 12, 38- 


First moment principle, 4, 16. 

FKG inequality, 5, 17, 35- 
Flow in a network, 22. 

Fortuin, Cornells Marius, 17. 

Forward versus backward, 21 • 

Four functions theorem, 17, 34. 

Friedland，Shmuel 汐 ) ， 44. 

Friedman, Bernard, urn, 19. 

Frieze, Alan Michael, 43. 

Games, 4, 8 , 13. 

Garey, Michael Randolph, 11 • 

Generating functions, 15, 22, 24, 32, 

33, 36, 40, 41. 

Generation of random objects, 41. 

Geometric distribution, 21, 24. 

Geometric mean and arithmetic mean, 

31 ， 39. 

Georgiadis, Evangelos (rscop 了 ta5r^, 
EuayysXo (；) 5 28. 

Gilat ， David, 38. 

Ginibre, Jean, 17- 
Golden ratio (0), 12, 26- 
Goldsmith ， Oliver, 47. 

Gosper, Ralph William, Jr., 28. 

Graham，Ronald Lewis (葛立恒 )， 47- 
Grid, 23. 

Grimmett, Geoffrey Richard, 32. 

Gumball machine problem, 31- 

HAKMEM, 28. 

Harmonic numbers, fractional, 38. 

Hashing, 9-10, 20. 

Hoeffding, Wassily, 9, 15- 
inequality, 9-10, 20. 

Importance sampling, 25. 

Inclusion and exclusion, 31, 34, 44. 
Incomplete beta function, 14. 

Independent events, 2. 

Independent random variables, 1, 7, 9, 

10, 13- 15, 20, 37. 

/c - wise, 1, 13- 
Infinite mean, 36, 38. 

Information, bits of, 24. 

Information gained, 25- 
Integer multilinear representation, see 
Reliability polynomials- 
Internet, ii, iii. 

Inverses, 36. 

Itoh, Toshiya (伊東利哉 ), 43. 

Jabbour-Hattab, Jean (^Lk^ 45. 

Jaccard, Paul, 42* 
index, 42-43. 

James White, Phyllis Dorothy, 11. 

Janson, Carl Svante, iv, 38, 40, 44. 

Jensen, Johan Ludvig William Valdemar, 33. 
inequality ， 4, 16, 33, 38, 45. 
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Johnson, David Stifler, 11. 

Joint distribution, 13, 24, 35. 

Joint entropy, 24. 

/c-cliques, 17- 

/c-wise independence, 1, 13- 
Kalai，Gil ( ，你 ， ”) ， 44. 

Kallenberg, Olav Herbert, 37. 

Karp, Richard Manning, 40. 

Kasteleyn, Pieter Willem, 17. 

Kendall, David George, 37. 

Knuth, Donald Ervin ( 高德纳 ) ， i, iii, 
iv ， 37, 40, 44, 47. 

Kolmogorov, Andrei Nikolaevich 

(KojiMoropoB ， AH^pefi HnKOJiaeBHH) 5 9. 
inequality, 9- 
Kullback, Solomon, 46- 

divergence, D(y\\x)^ 24-25. 

La Vallee Poussin, Charles Jean Gustave 
Nicolas de, 44. 

Lake Wobegon dice, 12. 

Large deviations, see Tail inequalities. 

Larrie, Cora Mae, 21. 

Least common multiple, 23- 
Leibler, Richard Arthur, 46- 
divergence, D(y\\x)^ 24-25. 

Lipschitz, Rudolph Otto Sigismund, 
condition, 10- 
Loaded dice, 24. 

Loop, running time of ， 21. 

Loops from a vertex to itself, 24• 

Lord, Nicholas John, 30. 

Lukacs, Eugene (= Jeno), 28. 

MacQueen, James Buford, 38. 

Magic masks, 28. 

Mahler, Kurt, 28. 

Markov (= Markoff), Andrei Andreevich 
(MapKOB, Ah 即 efi AH^peeBH^), 
the elder, 4. 

inequality, 4, 5, 16, 38, 39, 46. 

Martingale differences, see Fair sequences- 
Martingales ， 6-11 ， 18- 20, 24, 32. 

with respect to a sequence, 7, 19, 37. 
Max-flow min-cut theorem, 22, 42* 

Maximal inequality, 8-9, 20. 

McDiarmid, Colin John Hunter, 10. 

Median value of a random variable, 14, 24. 
Mengden, Nicolai Alexandrovitch von 

HnKOJiaH AjieKcaH^pOBH^nb 

25. 

Method of bounded differences, 10- 
Minhash algorithms, 43. 

Minterms, 31. 

Minwise independent permutations, 23. 
Mitzenmacher, Michael David, 43, 44. 
Moivre, Abraham de, 19, 37- 


Monotone Boolean functions, 5, 35- 
Monotone Monte Carlo method, 41. 
Montmort, Pierre Remond de, 36. 

Monus operation, 21-22. 

Moraleda Olivan, Jorge Alfonso, 27. 

Morse, Harold Calvin Marston, constant, 28. 
Motwani，Rajeev (TTWtT H) 匕帝 ) ， 40. 

MPR: Mathematical Preliminaries Redux, v. 
Multigraphs, 24. 

Multiplicatively fair sequences, 19. 
Multivariate Bernoulli distribution ， 

14, 18, 20. 

Multivariate total positivity, see FKG 
inequality- 

Mutual information, 24. 

NanoBingo, 12-13. 

Nawrotzki, Kurt, 42. 

Negative binomial distribution, 
cumulative, 14. 

Negatively correlated random variables, 

18, 29. 

Neumann, Peter, 30. 

Neville-Neil, George Vernon, III, 50- 
Newton, Isaac, 30. 

Nonnegative submartingales, 9, 38. 
Nonnegatively correlated random 
variables, 17. 

Nontransitive dice, 12. 

NP-complete problems, 11. 

Nullstellensatz, combinatorial, 23. 

One-sided estimates, 16- 
Optional stopping principle, 8, 38. 

Order ideals, 42. 

p (power set, the family of all subsets), 34. 
Pairwise independent random variables, 

1 ， 13. 

Paley, Raymond Edward Alan Christopher, 
24, 45. 

Paradoxes, 12, 13, 36. 

Parity number, 28. 

Parity of a binary integer, 13. 

Partial ordering, 22. 

Patashnik, Oren, 47. 

Pearson, Karl (= Carl), 29- 
Pi (TT) ， 14. 

Pitman, James William, 30- 
Pittel, Boris Gershon (rinTTejiB, Bopnc 
repniOHOBHH), 40- 
Playing cards, 1, 8, 14, 19. 

Poisson, Simeon Denis, distribution, 

15, 24, 35. 
trials ， 35. 

Polya, Gy orgy (= George), 6, 19- 
urn model, iv, 6-7, 19- 20, 38. 

Polynomials ， 23; see also Chebyshev 

polynomials，Reliability polynomials. 
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Positively correlated random variables, 

17, 28. 

Power series, 33. 

Prime implicants of a Boolean function, 

5, 34. 

Probability estimates ， 3—5, 8-9, 16. 
Probability generating functions, 15, 40. 
Probability spaces, 1-2. 

Propp, James Gary, 41. 

q.s -： Quite surely, 12, 20, 21. 

Quick, Jonathan Horatio, 18. 

Quite sure events, 12, 27. 

Raghavan, Prabhakar (LSlrrurr^rr 
[Trr^(su<bw)^ 40. 

Random bits, 2, 3, 5, 9, 13-15, 36. 

Random graphs, 16, 18- 
Random number generators, 41. 

Random permutations, 15- 
Random variables, 1-21. 

Random walk, 18. 
coalescing, 21. 
on r-cycle, 22, 42. 

Randomized algorithms, 21. 

Recurrence relations, 36, 41. 

Regular graphs and multigraphs, 24. 
Reliability polynomials, 5, 15, 16. 

Remond de Montmort, Pierre, 36. 

Renyi 5 Alfred, 34. 

Restricted growth strings, 15. 

Ross, Sheldon Mark, iv, 5, 33- 
Row sums, 22. 

Runs of a permutation, 37. 

5> m (a symmetric threshold function), 16. 
Samuels, Stephen Mitchell, 15, 31- 
Saturating addition and subtraction, 21—22. 
Savage, Richard Preston, Jr., 26. 

Say Red, 38. 

Schroeppel, Richard Crabtree, 28. 
Schulte-Geers，Ernst Franz Fred, iv, 

13, 31 ， 39. 

Schwarz, Karl Hermann Amandus, 
inequality, 46. 

Second moment principle, 4, 16, 24, 33. 

Set partitions, 30, 32. 

Sets, represented as integers, 35- 
Shinozaki，Takahiro ( 德崎隆宏)， 43. 
Shuffles, 1- 

Sideways sum (m), 13. 

Sketches, 23. 

St. Petersburg paradox, 36- 
Stadje，Gert Wolfgang ， 39. 

Stirzaker, David Robert, 32* 

Stopping rules, 7, 8, 19-20. 

Stork, David GoefFrey, 27. 
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Strassen, Volker, 42. 

Stross, Charles David George, iii. 
Subadditive law, 11. 

Submartingales, 8-9, 20. 

Submodular set functions, 35- 
Subsequence of a martingale, 18- 
Summation by parts, 38. 

Supermartingales, 8, 40. 

Superpolynomially small, 12, 40. 
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